Dangers with OS Metrics
Almost every time we discuss
data capture for VMware,
we’ll be asked
by someone if we
can capture the utilization of specific VMs,
by monitoring the
OS. The simple
answer is no.
The more complex
answer is that we can capture the data from the OS, but it may not be
reliable. So here’s an example of why.
We have 2 VMs. Within the 1 second interval
we are looking at, one of the VMs was only
allocated the CPU for ½ a second. In that ½ second the VM used 50% of it’s possible CPU time. So from the OS perspective it was running
at 50% CPU utilization. If we look at data from VMware, we’ll see that VMware
knows the VM only used ½ the CPU available in
½ a second. Or 25%.
The 2nd VM was running on CPU for the entire second. And again it used 50% of it’s possible CPU. So, to the OS, it appears
it was running at 50% CPU utilization, and VMware has the
same result.
The more contention there is for CPU time, the more time VMs will spend Dormant/Idle, and the further apart the values will be. This effect means that any metrics
which have an element of time in their calculation cannot be relied upon to be accurate.
Here is data from a real VM
The (top) dark blue line is the data captured
from the OS, and the (Bottom) light
blue line is the data from VMware. There clearly is some correlation between the two. At the start of
the chart there is about a 1.5% CPU difference. Given we’re only running
at about 4.5% CPU
that is an overestimation by the OS of about 35%. But at about 09:00 the difference
is ~0.5% so the difference doesn’t remain stable either.
Historically it’s not
been unusual to see situations where the OS metric is reporting 70% CPU
utilization and VMware is reporting 30%.
More on Wednesday, in the meantime don't forget to register for our next webinar 'Top 5 VMware tips for performance and capacity'
Phil Bell
Consultant
No comments:
Post a Comment