I'll be hosting our webinar VMware and Hyper-V Virtualization Over-Subscription (What's so scary?) on October 12 http://www.metron-athene.com/services/webinars/index.html so I thought it would be pertinent to take a look at the Top 5 Performance and Capacity Concerns for VMware in my blog series.
I’ll begin with Dangers with OS Metrics.
Almost every time we discuss data capture for VMware, we’ll be asked by someone if we can capture the utilization of specific VMs, by monitoring the OS. The simple answer is no.
In the example below the operating system sees that VM1 is busy 50% of the time but VMware sees is that it was only there for half of half the time and accordingly reports that it is 25% busy.
Looking at the second VM running, VM2, both the operating systems and VMware are in accordance that it is in full use and report that it is 50% busy.
This is a good example of the disparity that can sometimes occur.
OS vs VMware data
Here is data from a real VM.
The (top) dark blue line is the data captured from the OS, and the (Bottom) light blue line is the data from VMware. While there clearly is some correlation between the two, at the start of the chart there is about 1.5% CPU difference. Given we’re only running at about 4.5% CPU that is an overestimation by the OS of about 35%. While at about 09:00 the difference is ~0.5% so the difference doesn’t remain stable either. This is a small system but if you scaled this up it would not be unusual to see the OS reporting 70% CPU utilisation and VMware reporting 30%.
This large difference between what the OS thinks is happening and what is really happening all comes down to time slicing.
I'll be looking at time slicing on Monday.