Metron - Capacity Management: February 2011

Monday, 28 February 2011

Why SNMP based real-time monitors alone are not good enough for capacity analysts

It sounds like an obvious statement, but many managers looking to rationalize performance and capacity management activities to cut costs seem to miss the point. The thinking is that a real-time monitor collects performance data, so why not use it for everything relating to performance and capacity? I think there are some good reasons why not, and would be keen to hear of more from you.

SNMP data capture tends to be based on objects where a CPU is at the same level as a network switch which is at the same level as a JVM and so on. While this can be a useful methodology for providing real-time support, it really doesn't provide an organized view of the data required for capacity reporting.

Limited “user” data also means no meaningful workload components can be derived when building a capacity planning model. SNMP collection tells you ‘what’ happened, but not ‘who’ did it! It doesn’t help capacity analysis when there are unexplained peaks of usage but no comprehensive user data to indicate who was responsible for those peaks. When you call SNMP you can get data for processes running at that moment. However because of the rapid nature of command or process terminations in all environments, the capture ratio will be poor, i.e. you simply will not get to see what processes caused those peaks.

SNMP data sources are typically configured by a support group outside the capacity team and will be running on default values. This always means that the capacity planner has to compromise on the depth of data provided. All too often such sparse data can make their job next to impossible, even when working purely at the “system level” (e.g. no user, process or command data). By contrast the data collectors within full capacity management products such as Metron’s Athene software provide a rich source of data that has been custom built to meet the needs of the capacity analyst.

Any request to change the data provided via SNMP will usually require an external change control request. This request may be declined; it will certainly mean delay. With a specific capacity solution such as Athene, reconfiguration of data collection does not require external change control.

With SNMP capture each individual polling request adds a small number of bytes of network overhead. This constant polling of devices by SNMP can cause a significant amount of SNMP overhead in large environments. Also, each SNMP request will create overhead (CPU usage, for example) on the system where data is captured. While some overhead is unavoidable for any tool, using SNMP to capture data continuously has a higher likelihood for unwanted resource usage -- the price for not having data capture software specifically installed on the device that has been engineered for minimal overhead when capturing data.

I think of ‘full’ capacity management as encompassing the following, i.e.

· analysis and root cause identification of problems

· regular (e.g. daily, weekly, monthly) reporting of performance and capacity issues

· prediction of likely performance problems as workloads grow or change

· prediction of likely capacity requirements as the business changes

If you have any thoughts about other reasons why SNMP capture is insufficient to help the capacity analyst achieve these tasks, please comment.

Andrew Smith
Chief Sales & Marketing Officer

Wednesday, 23 February 2011

Cloud Computing gives the Capacity Manager a reason to change practices

Throughout the history of IT, Capacity Management has focused on showing our favorite component metrics such as CPU and Memory utilization. Cloud Computing gives the IT industry a reason to change practices and show Capacity Management from more of a Business Perspective. What do we mean “Business Perspective”? This means showing business / cost numbers in conjunction with the component metrics.

The point of this practice change is to show more concisely to the business owners where the money is being spent in a tangible way. The key to this endeavor is to create the type of reports that tell a story for both technical and business users.

A major component is the development of a web portal for your reports. This web portal can show similar types of reports for both component and cost. Think of the type of story you could tell your business users with forecasting reports that show cost instead of component.

Better yet, show the value of Capacity Management by displaying cost and component metrics on the same report.

In order for this to be valuable, the Capacity Management team will need to talk to the financial group and derive costs associated with components. A little more work on our part, but it could change the way Capacity Management is perceived within the enterprise. In the upcoming months, Metron will be presenting webinars focused on this exact endeavor.

Join us tomorrow and see how the Capacity Planner can show valuable reports depicting both component and business metrics together.

Charles Johnson
Consultant

Monday, 21 February 2011

Adapt your capacity process to support Cloud solutions

Potentially there are a large number of areas within a “standard” capacity process that need to be adapted to support Cloud solutions, but based on experience I believe the following topics represent the key areas.

Process interfaces
Tooling and monitoring
Scope and Maturity

To effectively manage the capacity of “The Cloud” stronger or at least redefined process interfaces will be required. Closer links with financial processes will be key to understanding the costs associated with the various options Public, Private, Hybrid etc and using this information to assess which will best meet the needs of the business. The determination of these costs and sizing the environment correctly will be critical in ensuring that using “The Cloud” actually pays.

Strong configuration and change processes will also be essential in tracking all elements of a service with the focus moving away from the component level information and towards the interconnectivity between these components. Finally I believe the relationship with Service Level management will require increased visibility when transitioning to the cloud, both from the perspective of managing the customer expectations and in capturing and documenting key service level performance metrics. A keen insight into both the service architecture and which Cloud implementation is best suited will be essential in ensuring the required service levels are continually met during and after any transition. As the popularity of public clouds expands, demand management as providers over commit resource to drive down costs and the available Capacity will become far more of an issue and careful stewardship of the performance targets will be a valuable asset.

The tooling and monitoring requirements will need to be re-evaluated prior to moving to a Cloud implementation as the traditional capacity focus at the component level will become less important, with an aggregated service view being key to understanding the service performance and usage. When selecting a tool try to ensure that it will monitor across the Enterprise and have the flexibility to import a wide variety of data sources. These information sources can then be used to provide a unified reporting portal to assist in capacity monitoring and planning for the Cloud, service and underlying components. In a cloud implementation rather than the components being the first bottleneck it is likely the network will be the focal point for initial performance monitoring and planning; more specifically the network links between your organisation and the cloud provider.

Ultimately I believe the biggest changes for the capacity management process will be the perceived scope of the process and the required maturity. In my experience as a consultant working for a number of large scale organisations the majority will have a detailed understanding of the component level side of the business i.e. the servers, a degree of knowledge at the service level, but little knowledge of the business and financial aspects. To successfully manage a Cloud implementation that meets the required service levels and provide actual financial savings, the process will need to cover all aspects of both the service and the underlying infrastructure, including networks and potentially facilities.

A detailed understanding of the business needs and drivers and again how these will relate to services and infrastructure is essential in a Cloud environment and to a lesser degree any large scale virtualization project.
Achieving this level of maturity and integration presents a considerable challenge for a capacity management team, but if achieved will benefit both the business and raise the profile of capacity management and ITIL immeasurably.

Come along to our free webinar on ‘Cloud Computing and Capacity Management’ this Thursday http://www.metron-athene.com/training/webinars/webinar_summaries.html

Rob Ford

Consultant

Friday, 18 February 2011

ITIL Capacity Management and the Cloud

Sadly it is no secret that the popularity of ITIL has taken a downturn in the last couple of years. This has been partly down to the economic climate, with companies unwilling to invest in updating an existing implementation to ITIL v3 or potentially implementing ITIL at all. Ironically it could be the economic downturn and the key cost saving technologies, which could breathe new life into ITIL.

Whilst the technologies available are prolific there are two which I believe are key within this discussion and they are Cloud based delivery models and virtualization. Both technologies cite considerable cost saving benefits, additional green credentials and all while maintaining the required service levels. The majority of enterprises have already embraced virtualization in all its many flavours and an increasing number are turning to “The Cloud” to deliver the next level of optimization.

It’s the embrace of these technologies where I believe ITIL will have a critical part to play. The requirement for strong ITIL processes to ensure these technologies are managed correctly and deliver the required business performance is essential. This is true for all of the ITIL disciplines but none more so than Capacity Management. Moving to “The Cloud” and potentially implementing Platform as a Service (PaaS) or Software as a Service (SaaS) can mean that the control may move to an external provider whilst the responsibility for performance assurance remains within capacity management.

How you can manage this change within the capacity management process will be the focus of my article on Monday.....

In the meantime you may like to sign up and come along to our free webinar on ‘Cloud Computing and Capacity Management’ http://www.metron-athene.com/training/webinars/webinar_summaries.html

Rob Ford

Consultant

Monday, 14 February 2011

Are you an unloved capacity manager?

Staring at your screen waiting for a Valentine’s e-card from that someone special – or your manager, today? Never fear. Here at Metron we love capacity managers, and if you want to receive our regular love letters let us know at sales@metron-athene.com ♥

Too often the Capacity Manager feels unloved and ignored. You spend your time producing reports and charts that never get seen. This is not great for your company, or for your job satisfaction and continued employment prospects! How do you get that special one to see you and love you? If you can, then a good Capacity Manager will save the company money. The major reason we see for Capacity Managers being the wallflower at the IT party is reports produced full of technical explanations but light on business focus. This leads to reports lacking impact with their target audience, and over time nobody knowing who the Capacity Manager is and just how much value they can provide.

So what can the Capacity Manager do to find corporate love and appreciation?

1.Write a Capacity Management business plan. Show how you and Capacity Management will save the company money and avoid crises over the next 2 or 3 years. Business brains are focussed on their revenue streams, so get the business figures and relate your capacity issues to those. Consider what it will cost the business if there is a “go slow” in service quality due to capacity issues and how this will affect the flow of money into the business.

You might have been ignored because the business has a lot of excess capacity. Consider how much the excess is costing the business, and present how you’re going to identify and use that excess capacity, saving potential future expenditure.

If you are running a home grown Capacity Planning and Management solution consider how much it’s costing the business to maintain. All too often we see Capacity Managers spending the majority of their time maintaining a solution rather than performing delivering value through capacity information to the business.

There are loads of business plan templates on the internet. They will not all fit perfectly to our situation here, but they are fairly easy to adjust to suit your needs (you may for instance want to throw away the marketing section!).

2.Here’s a tricky one. Dress smart. We know everyone should be appreciated for what they are, but impressions count. Make sure you’re physically and mentally spruced up and looking your best - you’re going to need it for the next step.

3.Find a Sponsor. Once you have a business plan with a business benefit, work out who’s going to be most receptive to your message and can influence others, then grab some time with them. Ambush them at the coffee machine, pounce on them at their desk and fix a meeting. Do it face to face. Another e-mail from “that Capacity Manager” can be deleted. An interesting business plan from this thought provoking and well presented ideas machine is harder to ignore.

4.Keep regular reports that you are producing business focussed. Really. People sitting around the board room table sipping lattes and pondering their stock portfolio care significantly less about CPU MHz, and disk occupancy than they do about £££ and $$$ or €€€. So start with the cash and provide the technical reasoning later on or in a separate report.

5. Don’t be afraid to raise support desk tickets based on capacity trend alerts (when you are happy with your trending and alerting solution). Automate it if you can. Don’t just rely on people reading your reports and taking action. Start firing the problems you identify at your service desk. Get your sponsor on board first mind you, so if people complain about the tickets you raise you’ve got some support. It all helps people understand that capacity is a day to day issue that saves money and supports the business achieve its goals.

So, you look good, you have good support around you, you’re talking the language of love that the business wants to hear. Who knows, you might just get that Valentine’s card next year.

Phil Bell
Consultant

Friday, 11 February 2011

The Ups and Downs of Trends - There are Soft and Hard trend limits

A trend of CPU utilization stops at 100%, right? Well, yes, but it doesn’t stop in the same way that a trend of disk space utilization stops.

As the average utilization of a server approaches 100% , transactions submitted to that server take longer and longer to execute, but they are unlikely to actually fail (unless specifically forced to do so by a timeout mechanism). The same applies to a variety of other limits such as I/O rate, network traffic, memory utilization and so on. Certainly the server cannot run at more than 100% utilization, but the world will not suddenly end when that limit is reached. Indeed, mainframes supporting large mixed workloads are designed to run at 100% processor utilization for long periods.

On the other hand, if an application needs to write data into a disk or filesystem that doesn’t have enough free space to hold it, then typically the application will fail with a hard error, and some kind of recovery action will be necessary. Depending on the sophistication of the application and/or the file system, this recovery may take place automatically behind the scenes, but it still may result in a variety of complicated administrative problems.

For this reason, a trend showing that a filesystem is predicted to be filling up may need to be dealt with more urgently than a trend showing that CPU utilization is approaching 100%.

Lesson 3 – not all trends are equal. Some values of 100% are more painful than others.

Look out for my future blogs where I’ll be tackling some of the more complicated features of trends like statistical measures of confidence in the trend.

I’ll also be looking at cases where trending just isn’t going to work, so that alternative approaches, such as modeling, are required.

Until then take a look at Athene, our capacity management and modeling software http://www.metron-athene.com/

Andy Mardo
Product Manager

Wednesday, 9 February 2011

The Ups and Downs of Trends - Peaks and averages may change at different rates.

Most computer performance analysts will be interested in both the average load that a given server needs to support, and the peak load. As is widely recognized, any server supporting a customer-facing application needs to be responsive at peak times – the dreaded alternative is loss of business to competitors who are only a click away.

The following picture shows the measured CPU utilization of a particular server, with points for both the daily average utilization, and the utilization during the peak hour each day. Separate trend lines have been fitted to the daily average value and to each day’s peak hour.

At the start of the time period, see that the trend line for the daily peak is about 19 percentage points higher than the trend line for the daily average. However, by the end of the time period, the trend line for the peak values is about 25 percentage points higher than the average trend – in other words, these two trend lines are diverging. The peaks are rising faster than the average value. Depending on your requirements, you may want to track the average, or the “average peaks”, or the “peak peaks”.

Lesson 2 – you may well need to use more than one trend line to adequately describe changes in a single variable.

I'll be completing my series with a look at soft and hard trend limits this Friday.

Andy Mardo

Product Manager

Monday, 7 February 2011

The Ups and Downs of Trends

Trending is a widely-used, reliable, and easy method of predicting the future. Or is it? Certainly it’s widely used. Trending facilities are freely available in countless statistical applications and performance analysis tools. I’m going to take a look at some of the issues surrounding the use of trending, with examples from a variety of domains.

“Prediction is very difficult, especially about the future” (Niels Bohr, Danish physicist).

This first picture shows the number of DVD players that were sold each year in the United States, from 1997 (when DVDs first became widely available) until 2003.

You can imagine the consumer electronics manufacturers tooling up their production lines to meet demand for the countless millions, then billions, of DVD players that would be sold in the future. Then, of course, this happened:

Sales of DVD players stopped rising when most people who wanted one had bought one.

Another example of the principle that most trends will terminate somewhere, is a famous article in the Times of London in 1894 which predicted that within 50 years, every street in the metropolis would be buried under nine feet of horse manure. (This prediction isn’t an urban myth – the problems caused by horses were a major worry for town planners all over the developed world; see, for example, http://www.fee.org/pdf/the-freeman/547_32.pdf).

Lesson 1 – Most trends don’t extrapolate for ever. Try to work out what the limiting factors might be.

If you've got any good examples of predicted trends that didn't happen...let me know.

Keep following as there's more from me on trending this Wednesday.

Andy Mardo

Product Manager

http://www.metron-athene.com/

Friday, 4 February 2011

How green is your IT?

‘The global information and communications technology (ICT) industry accounts for approximately 2 percent of global carbon dioxide (CO2) emissions, a figure equivalent to aviation’ according to an estimate by Gartner, Inc.

IT dispensing CO2 at the same rate as airlines et al? Little wonder then that in recent years there‘s been growing demand for IT to meet the needs of the business in a sustainable way.

IT needs to look at lessening its impact on the environment.

I have said before that effective Capacity Management is an integral part of the Green IT process and it should be central to the plans of Companies who are leveraging new technology, particularly in the areas of Virtualization and Cloud Computing.

Why do I believe this so vehemently? Biased I may be, but the proof that good capacity management lies at the heart of Green IT, indeed lies at the heart of good business strategy, is irrefutable isn’t it?

Many Companies are making, or have made, the transition from physical to virtual servers but over provisioning merely negates the Green IT and cost benefits. How do you prevent over provisioning?

In the Cloud Computing arena the production of inaccurate SLA’s when transitioning to the Cloud can have far ranging ramifications for a Company. How do you decide what you need, when you need it and who you need it for? Without effective capacity management tied to SLAs, rapid provisioning could just be a way of over-provisioning done faster than ever before.

New technologies can go a long way to helping Companies meet their Green initiatives but only if they are effectively managed, otherwise the benefits to both the Company and the environment are squandered.

Join us on February 10 at our free webinar ‘Capacity management: At the heart of Green IT’ and see how capacity management enables you to harness and maximise all the benefits that these technologies promise, and realise your Green IT objectives by doing so.

Register now http://www.metron-athene.com/training/webinars/index.html and if you have any opinions on the role that capacity management plays in this arena feel free to share them with me.

Andrew Smith

Chief Sales & Marketing Officer

Wednesday, 2 February 2011

Too many servers not enough eyes - where did all these servers come from?! (9 of 9)

As analysts cannot look at many detailed performance graphs for hundreds or thousands of servers and still proactively prevent performance or capacity-related problems, data centers must rely more on tasks that can be easily automated.

In order to move from a reactive, ad-hoc way of managing performance and capacity related issues, companies should either purchase vendor tools or develop their own applications that provide:

Automation of data capturing and retention

Automated exception-based reporting

Automated performance alerting

Intelligent, automated trending of business and performance metrics

Automated exception-based trend reporting

Automated trend alerting

Automated interpretation of performance data, along with advice and guidance, where appropriate

Targeted analytic modeling – based on the systems and applications identified by the exception reporting and alerting

Building such an infrastructure allows the analysts to focus on capacity and performance-constrained applications on a timely and proactive basis.

Intranet reporting, standard with many vendor tools, should provide management with an easy browser interface, along with many clickable links that lead to performance and trend reports. These reports should be on an exception-only basis, if so desired, and should be easily navigated through a single common structure and/or easily sent to IT staff by email or other communication methods.

Since these reporting tools are checking data values against performance and trend thresholds when building the reports, the web pages containing the reports should show some automated advice and data interpretation that can help close the communication gap between the data center and the business unit. But most of all, reports generated for the business units, whenever possible, should be built in business terms easily understood by non-IT staff.

Using the reporting and alerting mechanism and by overlaying a targeted analytic modeling scheme for crucial applications and using analytic models to determine workload and hardware configuration changes, the data center will be much more manageable – regardless of how many servers there are.

It's the end of my series today so please feel free to share your thoughts with me.

To download the full version of this blog visit http://www.metron-athene.com/_downloads/_documents/papers/too_many_servers.pdf

Rich Fronheiser

Consultant