Performance metrics collected from
VMware’s API, which includes data stored in the vCenter database, reports on
Minimum/Maximum (Peak) values for selected counters. For example, CPU usage and
Usage Mhz.
Peak values are of interest to capacity
planners as it's their job to account for worst
case scenarios.
Fortunately, VMware provides such
metrics but unfortunately, these values are not real peaks (or maximums)
they are averages for the collection interval, which is typically 5 minutes.
The following example of CPU utilization
shows a worse-case scenario illustrating why this can be “bad” (well, at least
problematic for the capacity planner)
This table shows 20
second averages (as reported by the realtime monitor). The entire table
represents the 5 minute interval on which the API reports.
Products which use
the API or database will show the 5 minute interval as having a maximum value
of 62.5% utilization. This isn't the true peak, it's an average over the
interval. If you were to look at this number you'd draw the conclusion that the
system is healthy - after all, there's plenty of headroom.
This
is simply not the case: ¾ of the time
the CPU is maxed out and there is no headroom. Unfortunately, you didn’t see it as the true peak is hidden by averaging of the values.
That 3 minute
period where utilization is at 100% is a long time for an application to be
starved of CPU resources. Bad things can happen: queues can form, applications
can slow to a crawl or completely fall over.And no one would be
any the wiser as to why.
Capacity
planners must account for worst case scenarios in order to meet application
demands and meet service level agreements (SLAs). An average in this case just
doesn’t fulfill the requirement.
Consider a service
provider (internal or external) supporting an application. The client (internal
or external) receives reports or a dashboard illustrating application and
system performance. The application or service has issues, yet the information
the client is provided shows no issues - after all, there is plenty of
headroom. So what’s the problem? And who is responsible?
Ultimate
responsibility falls back to the service provider (and the capacity planner)
who must diagnose and correct the issue.
After the fact. And must (embarrassingly no doubt) report back to the
client what happened and why it was not provided for. Not good.
Our athene® Capacity Management software leverages the real time feed from VMware to report actual
peaks. These can be incorporated into a reporting regimen to give capacity
planners, and their customers, proper insight into a system’s behavior to
account for the demands of an application. This is all done with no additional
impact on the vCenter.
What the capacity planner needed to account for was
the 100% utilization.
Our Resources has a great selection of VMware Capacity Management videos and white papers, go take a look.
Don Fadel
Regional Director
No comments:
Post a Comment