Metron - Capacity Management: CPU

Showing posts with label CPU. Show all posts

Wednesday, 23 November 2016

Virtualization Oversubscription (What’s so scary?) - What can be oversubscribed? (4 of 20)

Today I'll deal with what can be oversubscribed.

What can be oversubscribed?

• CPUs

• Memory

• Disk

• NICs

In our virtual world, now that we have broken the link between the OS and the hardware, we can over provision all sorts of things.

CPU, Memory, Disk (as we mentioned) and NICs are all “Oversubscribed”.

Disk we have already looked at, Memory and CPU I’ll go into in more detail on later but I thought it was worth mentioning NICs here.

Typically people seem to be running with 10 - 15 VMs on a single host which will have significantly fewer NICs installed. A Server typically wouldn’t use all the bandwidth of its NIC so that unused bandwidth is like the unused space on disk.

When the VMs talk to other VMs on the same Host that’s not generating traffic though the physical NICs, so we might consider that as the equivalent of de-duplication.

In the next few blogs I'll be looking at CPU and Memory.

Phil Bell

Consultant

Wednesday, 12 October 2016

5 Top Performance and Capacity Concerns for VMware - Ready Time

Imagine you are driving a car, and you are stationary, there could be several reasons for this. You may be waiting to pick someone up, you may have stopped to take a phone call, or it might be that you have stopped at a red light. The first two of these (pick up, phone) you have decided to stop the car to perform a task. In the third instance the red light is stopping you doing something you want to do. In fact you spend the whole time at the red light ready to move away as soon as the light turns to green. That time is ready time.

When a VM wants to use the processor, but is stopped from doing so. It accumulates ready time and this has a direct impact on performance.

For any processing to happen all the vCPUs assigned to the VM must be running at the same time. This means if you have a 4 vCPU all 4 need available cores or hyperthreads to run. So the fewer vCPUs a VM has, the more likely it is to be able to get onto the processors.

To avoid Ready Time

You can reduce contention by having as few vCPUs as possible in each VM. If you monitor CPU Threads, vCPUs and Ready Time you’ll be able to see if there is a correlation between increasing vCPU numbers and Ready Time in your systems.

Proportion of Time: 4 vCPU VM

Below is an example of a 4vCPU VM, each doing about 500 seconds worth of real CPU time and about a 1000’s worth of Ready Time.

For every 1 second of processing the VM is waiting around 2 seconds to process, so it’s spending almost twice as long to process than it is processing. This is going to impact on the performance being experienced by the end user who is reliant on this VM.

Now let’s compare that to the proportion of time spent processing on a 2 vCPU VM. The graph below shows a 2 vCPU VM doing the same amount of work, around 500

seconds worth of real CPU time and as you can see the Ready Time is significantly less.

There are 3 states which the VM can be in and we'll take a look at these on Friday.

Don't forget to book on to our VMware vSphere Capacity & Performance Essentials workshop starting on Dec 6 http://www.metron-athene.com/services/online-workshops/index.html

Phil Bell

Consultant

Monday, 10 October 2016

5 Top Performance and Capacity Concerns for VMware - Time Slicing

As I mentioned on Friday the large difference between what the OS thinks is happening and what is really happening all comes down to time slicing.

In a typical VMware host we have more vCPUs assigned to VMs than we do physical cores.

The processing time of the cores has to be shared among the vCPUs. Cores are shared between vCPUs in time slices, 1 vCPU to 1 core at any point in time.

More vCPUs lead to more time slicing. The more vCPUs we have the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time extra time interrupts are sent in quick succession. So time passes slowly and then very fast.

More time slicing equals less accurate data from the OS.

Anything that doesn’t relate to time, such as disc occupancy should be ok to use.

On Wednesday I'll be dealing with Ready Time, you've still got time to register to come along to my webinar 'VMware and Hyper-V Virtualization over-subscription(What's so scary?) taking place on October 12. http://www.metron-athene.com/services/webinars/index.html

Phil Bell

Consultant

Monday, 5 September 2016

How to monitor CPU - Windows Server Capacity Management 101(9 of 12)

As promised today we'll be looking at how to monitor CPU.

Thresholds

When dealing with thresholds there is no one size fits all but a good rule of thumb is 70% for a warning and 85% for an alarm these can and should be tweaked when you have a better idea of performance thresholds for your CPU.

Additionally it is good to have a thresholds in place for when a CPU is being under-utilized maybe threshold for 20% and 10% this lets you know which machine could be pushed harder.

Trends

When setting up a trend, you have to remember the longer the trend the less reliable it is. A good rule of thumb for trend is 3 months, as this gives a reasonably reliable trend and also lets you know within time to make a hardware change.

Reports

CPU Total Utilization Estd% - Report Example

Above is an example of an Estimated CPU core busy over a month for my computer with a trend going forward 1 month, you can see quickly that the trend line is going down. This kind of chart is very simple to create with a capacity management tool like athene^®.

On Wednesday I'll be dealing with Memory and how to monitor this. Don't forget to take a look at our workshops, there are some great ones coming up soon

http://www.metron-athene.com/services/online-workshops/index.html

Josh Worth

Consultant

Thursday, 1 September 2016

How to monitor and manage CPU - Windows Server Capacity Management 101(8 of 12)

Hyper threading is splitting a single CPU core into a two logical processors, each of these processors can execute a separate piece of work. You will see one thread being the dominate thread and one processing when the other is stalled. There is some trade off with hyper-threading as it take time for the CPU to switch between threads, Some work will fit well with this such as multiple threads of light-weight work, and more heavy work that needs the whole power of a core to get through could work slower with hyper-threading.

Depending on the type of work Hyper-threading is not always beneficial, sometimes it is better to have cores not have hyper-threading into multiple threads as the jumping between threads can lower the throughput.

On Monday I'll take a closer look at Thresholds and Trends. In the meantime why not take our Capacity Management Maturity Survey and get your free 20 page report.

http://www.metron-athene.com/_capacity-management-maturity-survey/survey.asp

Josh Worth

Consultant

Thursday, 14 July 2016

VMware, Virtual Center Headroom (17 of 17) Capacity Management, Telling the Story

Today I’ll show you one final report on VMware, which looks at headroom available in the Virtual Center.

In the example below we’re showing CPU usage. The average CPU usage is illustrated by the green bars, the light blue represents the amount of CPU available across this particular host and the dark blue line is the total CPU power available.

VMware – Virtual Center Headroom

We have aggregated all the hosts up within the cluster to see this information.

We can see from the green area at the bottom how much headroom we have to the blue line at the top, although actually in this case we will be comparing it to the turquoise area as this is the amount of CPU available for the VM’s.

This is due to the headroom taken by VMkernel which has to be taken in to consideration and explains the difference between the dark blue line and the turquoise area.

Summary

To summarize my blog series, when reporting:

• Stick to the facts

• Elevator talk

• Show as much information as needs to be shown

• Display the information appropriate for the audience

• Talk the language for the audience

….Tell the Story

Hope you've enjoyed the series, if you have any questions feel free to ask. If you're interested in VMware Capacity Management don't forget to book on to our workshop http://www.metron-athene.com/services/online-workshops/index.html#vmwarevsphere

Charles Johnson

Principal Consultant

Tuesday, 12 July 2016

VMware Reports (16 of 17) Capacity Management, Telling the Story

Let’s take a look at some examples of VMware reports.

The first report below looks at the CPU usage of clusters in MHz. It is a simple chart and this makes it very easy to understand.

VMware – CPU Usage all Clusters

You can immediately see who the biggest user of the CPU is, Core site 01.

The next example is a trend report on VMware resource pool memory usage.

The light blue indicates the amount of memory reserved and the dark blue line indicates the amount of memory used within that reservation. This information is then trended going forward, allowing you to see at which point in time the required memory is going to exceed the memory reservation.

VMware – Resource Pool Memory Usage Trend

A trend report like this is useful as an early warning system, you know when problems are likely to ensue and can do something to resolve this before it becomes an issue.

We need to keep ahead of the game and setting up simple but effective reports, provided automatically, will help you to do this and to report back to the business regarding requirements well in advance.

On Thursday I’ll show you one final report on VMware, which looks at headroom available in the Virtual Center, in the meantime take a look at out VMware vSphere Capacity and Performance Essentials workshop.
http://www.metron-athene.com/services/online-workshops/index.html#vmwarevsphere

Charles Johnson
Principal Consultant

Friday, 8 July 2016

Model – Linux server change & disk change (15 of 17) Capacity Management, Telling the Story

Following on from Wednesday's blog today I'll show the model for change in our hardware.

In the top left hand corner we are showing that once we reach the ‘pain’ point and then make a hardware upgrade the CPU utilization drops back to within acceptable boundaries for the period going forward.

In the bottom left hand corner you can see from the primary results analysis that the upgrade would mean that the distribution of work is more evenly spread now.

The model in the top right hand corner has bought up an issue on device utilization with another disk so we would have to factor in an I/O change and see what the results of that would be and so on.

In the bottom right hand corner we can see that the service level has been fine for a couple of periods and then it is in trouble again, caused by the I/O issue.

Whilst this hardware upgrade would satisfy our CPU bottleneck it would not rectify the issue with I/O, so we would also need to upgrade our disks.

When forecasting modeling helps you to make recommendations on changes that will be required and when they will need to be implemented.

On Monday I'll take a look at some examples of VMware reports.
In the meantime why not register for our next webinar Capacity Planning and Forecasting using Analytic Modeling http://www.metron-athene.com/services/webinars/index.html

Charles Johnson
Principal Consultant

Wednesday, 6 July 2016

Modeling Scenario (14 of 17) Capacity Management, Telling the Story

I have talked about bringing your KPI’s, resource and business data in to a CMIS and about using that data to produce reports in a clear, concise and understandable way.

Let’s now take a look at some analytical modeling examples, based on forecasts which were given to us by the business.

Below is an example of an Oracle box, we have been told by the business that we are going to grow at a steady rate of 10% per month for the next 12 months. We can model to see what the impact of that business growth will be on our Oracle system.

In the top left hand corner is our projected CPU utilization and on the far left of that graph is our baseline. You can see that over a few months we begin to go through our alarms and our thresholds pretty quickly.

Model – oracleq000 10% growth – server change

In the bottom left hand corner we can see where bottlenecks will be reached indicated by the large red bars which indicate CPU queuing.

On the top right graph we can see our projected device utilization for our busiest disk and we can see that within 4 to 5 months it is also breaching our alarms and thresholds.

Collectively these models are telling us that we are going to run in to problem with CPU and I/O.

In the bottom right hand graph is our projected relative service level for this application. In this example we started the baseline off at 1 second, this is key.

By normalizing the baseline at 1 second it is very easy for your audience to see the effect that these changes are likely to have. In this case, once we’ve added the extra workload we can see that we go from 1 second to 1.5 seconds (a 50% increase) and then jumped from 1 second to almost 5 seconds. From 1 to 5 seconds is a huge increase and one that your audience can immediately grasp and understand the impact of.

We would next want to show the model for change in our hardware and I'll be looking at this on Friday.

In the meantime why not join our Community and get access to a wealth of Capacity Management Resources http://www.metron-athene.com/_resources/

Charles Johnson

Principal Consultant

Monday, 4 July 2016

Business Metric Correlation (13 of 17) Capacity Management, Telling the Story

As mentioned previously it is important to get business information in to the CMIS to enable us to perform some correlations.

As in the example below we have taken business data and taken component data and we can now report on this together to see if there is some kind of correlation.

Business Transactions vs. CPU Utilization

In this example we can see that the number of customer transactions(shown in dark blue) reasonably correlates with the amount of CPU utilization.

Can we make some kind of judgment based on just what we see here? Do we need to perform some further statistical analysis on this data? What is the correlation co-efficiency for our application data against the CPU utilization?

Closer to the value of 1 indicates that there is a very close correlation between the application data and the underlying component data.

What can we do with this information back to the business? An example would be: This graph indicates that there is a very close correlation between the number of customer transactions and the CPU utilization. Therefore, if we plan on increasing the number of customer transactions in the future we are likely to need to do a CPU upgrade to cope with that demand.

On Wednesday I'll be looking at a Modeling scenario.

Charles Johnson

Principal Consultant

Wednesday, 29 June 2016

Unix Reports (11 of 17) Capacity Management, Telling the Story

As I mentioned on Monday today and on Friday we'll look at a couple of example reports which have been created for a Linux box.

Below is a simple utilization report for a Linux box running an application, it is for a day and it shows us that there are a couple of spikes where it has breached our CPU threshold.

Looking at this report we can see that these peaks take place during the middle of the day. Is that normal behavior? Do you know enough about the system or application that you are reporting on to make that judgment? Do we need to perform a root cause analysis? If we believe the peaks to be normal then maybe we need to adjust the threshold settings or if we are unsure then we need to carry out further investigation. Has some extra load been added to the system? Has there been a change? Are there any other anomalies that you need to find out about?

Remember when reporting don’t make your reports over complicated.

Don’t put too much data on to one chart, it will look messy and be hard to understand.

On Friday I'll show you an example of a report on disk utilization of a Linux server.

In the meantime sign up to our Community and get access to white papers, on-demand webinars and more http://www.metron-athene.com/_resources/index.html

Charles Johnson

Principal Consultant

Friday, 24 June 2016

Resources and Costs (9 of 17) Capacity Management, Telling the Story

The report below is a dashboard view of resources and costs.

In the top left hand corner we have a cost by application. This is a breakdown of our applications and how much each of these is actually costing us. In this instance the biggest cost to us is development.

In the top right hand corner is an analysis of how much memory each of our applications is using and again it is development who are consuming the most memory resources.

In the bottom left hand corner we can view the sum of the CPU, this is the usage of the CPU by the application and in this case an in-house application is consuming the most CPU.

In the bottom right hand corner we have the numbers of CPU’s that are being used by applications and again in this example development are using the largest amount.

This is a very clear and concise way of displaying the information to your audience.

On Monday we’ll look in more detail at dashboards.

Charles Johnson

Principal Consultant