Still sticking with the status of capacity management within the Wintel
virtual infrastructure (VI) farm at a large enterprise, we can now look at
medium term improvements
SPM:
Currently there is a recruitment exercise for a role to be responsible for
Capacity Management. This is an
essential step in providing the appropriate process structure to unify the
reporting, data sources and start to move towards a more pro-active approach.
Proactive:
The focus is currently on that of system or component level monitoring. Whilst this gives an accurate report of how
the cluster, host or guest is performing, the next step is to monitor at the
process level. Enabling this process level information is the first step in workload
characterization and a requirement for the proposed longer term step of a
further application consolidation activity.
Also at process level is the required analysis known as ‘process
pathology’ which can be done even at the VM level to identify ‘rogue’ VMs or
processes and ‘flatline’ VMs or applications.
Rogue processes include memory leaks, program loops and other problems
that do not show up in an overall system level analysis but yield apparent (but
unnecessary) excessive demands for resources.
Services: The
next stage in the maturity of the capacity management process is to work with
the service level management process in establishing response time thresholds
with the SLA and generally linking the performance of a service with the
underlying infrastructure. Moving forward this will provide an additional KPI
as to the relative performance benefits of the virtualisation project and it
will provide a valuable feed into the service level management process.
Portal: The
proposed vehicle for a reporting regime is the capacity management portal.
The capacity
reporting available and how it should be used is discussed throughout this blog
series. The process diagram below should
further clarify both the sources of information and the expected output
Whilst it is
acknowledged that the unification of the various independent databases holding
performance data is unlikely, at least in the short-term, the key part of the
requirement is the production of a set of capacity reports that provide a ‘pane
of glass’ view into exactly how the virtual environment (and longer term the
entire estate) is performing and how much capacity is available.
In order to
provide that singular usable view it is important that the following steps be
taken:
•
Provide
a common look and feel to the reports, independent of their source
•
Ensure
an appropriate educational exercise has taken place i.e. make sure people know
where the reports are available and what they mean
•
Provide
easy access for the capacity portal usually via the web and perhaps email for
any exception reports
A longer term
goal would be integrate with the Configuration Database to incorporate
important relationship and topological information. This provides a valuable service level
capacity data feed and will allow for the reporting/planning activities to be
undertaken at the service level as well as component level.
I’ll be
discussing longer term improvements on Friday………
Adam Grummitt
Distinguished Engineer
No comments:
Post a Comment