Tuesday, 29 August 2017

VMware Capacity Planning - When a Peak is Not a Peak

Performance metrics collected from VMware’s API, which includes data stored in the vCenter database, reports on Minimum/Maximum (Peak) values for selected counters. For example, CPU usage and Usage Mhz.

Peak values are of interest to capacity planners as it's their job to account  for worst case scenarios. 

Fortunately, VMware provides such metrics but unfortunately, these values are not real peaks (or maximums) they are averages for the collection interval, which is typically 5 minutes.

The following example of CPU utilization shows a worse-case scenario illustrating why this can be “bad” (well, at least problematic for the capacity planner)

This table shows 20 second averages (as reported by the realtime monitor). The entire table represents the 5 minute interval on which the API reports.

Products which use the API or database will show the 5 minute interval as having a maximum value of 62.5% utilization. This isn't the true peak, it's an average over the interval. If you were to look at this number you'd draw the conclusion that the system is healthy - after all, there's plenty of headroom.

This is simply not the case:  ¾ of the time the CPU is maxed out and there is no headroom. Unfortunately, you didn’t see it as the true peak is hidden by averaging of the values.

That 3 minute period where utilization is at 100% is a long time for an application to be starved of CPU resources. Bad things can happen: queues can form, applications can slow to a crawl or completely fall over.And no one would be any the wiser as to why.

Capacity planners must account for worst case scenarios in order to meet application demands and meet service level agreements (SLAs). An average in this case just doesn’t fulfill the requirement.

Consider a service provider (internal or external) supporting an application. The client (internal or external) receives reports or a dashboard illustrating application and system performance. The application or service has issues, yet the information the client is provided shows no issues - after all, there is plenty of headroom. So what’s the problem? And who is responsible?

Ultimate responsibility falls back to the service provider (and the capacity planner) who must diagnose and correct the issue.  After the fact. And must (embarrassingly no doubt) report back to the client what happened and why it was not provided for. Not good.

Our athene® Capacity Management software leverages the real time feed from VMware to report actual peaks. These can be incorporated into a reporting regimen to give capacity planners, and their customers, proper insight into a system’s behavior to account for the demands of an application. This is all done with no additional impact on the vCenter.

What the capacity planner needed to account for was the 100% utilization.

Our Resources has a great selection of VMware Capacity Management videos and white papers, go take a look. 

Don Fadel
Regional Director

No comments:

Post a Comment