Wednesday 20 February 2013

Capacity Management: Guided Practitioner Satnav – Medium term improvements (14 of 17)

Still sticking with the status of capacity management within the Wintel virtual infrastructure (VI) farm at a large enterprise, we can now look at medium term improvements
SPM: Currently there is a recruitment exercise for a role to be responsible for Capacity Management.  This is an essential step in providing the appropriate process structure to unify the reporting, data sources and start to move towards a more pro-active approach.
Proactive: The focus is currently on that of system or component level monitoring.  Whilst this gives an accurate report of how the cluster, host or guest is performing, the next step is to monitor at the process level. Enabling this process level information is the first step in workload characterization and a requirement for the proposed longer term step of a further application consolidation activity.  Also at process level is the required analysis known as ‘process pathology’ which can be done even at the VM level to identify ‘rogue’ VMs or processes and ‘flatline’ VMs or applications.  Rogue processes include memory leaks, program loops and other problems that do not show up in an overall system level analysis but yield apparent (but unnecessary) excessive demands for resources.
Services: The next stage in the maturity of the capacity management process is to work with the service level management process in establishing response time thresholds with the SLA and generally linking the performance of a service with the underlying infrastructure. Moving forward this will provide an additional KPI as to the relative performance benefits of the virtualisation project and it will provide a valuable feed into the service level management process.
Portal: The proposed vehicle for a reporting regime is the capacity management portal.
The capacity reporting available and how it should be used is discussed throughout this blog series.  The process diagram below should further clarify both the sources of information and the expected output
Whilst it is acknowledged that the unification of the various independent databases holding performance data is unlikely, at least in the short-term, the key part of the requirement is the production of a set of capacity reports that provide a ‘pane of glass’ view into exactly how the virtual environment (and longer term the entire estate) is performing and how much capacity is available.

In order to provide that singular usable view it is important that the following steps be taken:
                Provide a common look and feel to the reports, independent of their source

          Ensure an appropriate educational exercise has taken place i.e. make sure people know where the reports are available and what they mean

          Provide easy access for the capacity portal usually via the web and perhaps email for any exception reports
A longer term goal would be integrate with the Configuration Database to incorporate important relationship and topological information.  This provides a valuable service level capacity data feed and will allow for the reporting/planning activities to be undertaken at the service level as well as component level.

I’ll be discussing longer term improvements on Friday………
Adam Grummitt
Distinguished Engineer

No comments:

Post a Comment