Metron - Capacity Management: March 2012

Wednesday 28 March 2012

The Capacity Manager and the changing focus

I've been a Capacity Manager in three different industries. At each, I was mainly focused with hardware capacity and ensuring that we didn't have hardware bottlenecks for any of our applications. Less important was making sure that we didn't have excessive headroom or capacity on the floor (thereby wasting money) at any given time.

Times have changed. Alignment to the business brought about by a focus on ITIL and other process methodologies has meant that the normal reports generated by the Capacity Managers just aren't enough anymore.

Business managers have no interest in knowing what the swap space utilization is, nor do they care about data stores, resource pools, clusters, network adapters, or any of the other technology that makes the business work. As in my previous blog, they just want to flip the light switch and see light.

So naturally, the focus needs to be more on services and applications and, specifically, the transactions that matter.

It's impossible to set and police viable service level agreements if there's no way to measure how long transactions take. Furthermore, there's no easy way to improve the performance of transactions without knowing how much time is being spent in each layer of the infrastructure. Sure, a transaction is supposed to complete in 10 seconds, but if the transaction is spending 15 seconds at the database, that would seem to be a reasonable place to start.

Today's Capacity Manager needs to be involved with the hardware, sure, but also needs to be aware of and tracking the performance of business transactions -- the objects that are important to the business managers.

Remember, those managers don't want to know why performance is poor -- they just want to flip the light switch and see light.

This is why I'm excited about our Correlsense SharePath and RUM (Real User Monitor) products. We capture and store component utilization data with Athene that's vital in determining how much headroom is available and also can be used in determining how close we are to a hardware bottleneck. Sharepath captures data from all of the meaningful production transactions and gives a starting point for root cause analysis when transactions aren't performing as promised.

If the problem's a capacity problem, Athene can be used to drill down and find opportunities to remove bottlenecks. If the number of transactions are growing over time or the business anticipates the number of transactions increasing, Athene's Planner can model these changes to anticipate when the bottleneck will happen.

It's an exciting partnership of products and helps close the gaps, providing the Business, Service, and Component views that effective Capacity Management demands.

http://www.metron-athene.com/documents/factsheets/published_paper/correlsense_and_metron_transaction-based_capacity_planning.pdf

Rich Fronheiser

VP, Strategic marketing

Friday 23 March 2012

The critical activity in Service Delivery activities is Capacity Management

Capacity Management is all about ensuring adequate performance to meet service level targets as demands change.

The critical activity in Service Delivery activities is Capacity Management. Unique amongst ITIL processes, with its concentration on looking at the future rather than just the past, Capacity Management is all about ensuring adequate performance to meet service level targets as demands change.

All of the ITIL processes need to interact with Capacity Management to ensure the successful provision of IT services and all have a direct dependency on Capacity Management in order for them to be successfully carried out.

For example, Service Level Management depends on Capacity Management to show that service targets will continue to be met as the workload changes over time. Change Management depends on Capacity Management to show that changes can be implemented without adversely affecting service levels. Problem Management needs data from Capacity Management to identify performance problems at an early stage so action can be taken to prevent them, a cheaper option than fixing problems when a crisis occurs.

Capacity Management can be justified in a number of ways - deferred expenditure on hardware; fewer capacity crises causing downtime or slow performance; information to support negotiation with suppliers; less staff time wasted in firefighting problems, among others.

Increasing the effectiveness with which virtualization technology is deployed is a good example.

Take a calculation from VMware’s ROI (Return On Investment) calculator which shows a prediction that for a given population of 500 physical servers, they can potentially be ‘collapsed’ onto 48 virtual servers, a collapse ratio of 10.4:1.

Capacity management offers techniques to assess if this plan can be improved. A small increase in the collapse ratio will mean significant real financial savings for an organization. If, for example, the collapse ratio could be increased to 15:1, this could be worth £1.25m over three years to your organization.

This is likely to be quite realistic as Supplier plans of this type tend to be conservative, aiming to avoid capacity issues by an element of over-provisioning.

Our extensive experience shows that such plans can usually be improved without any adverse impact on the service delivered to the business.

Whatever the driver for doing Capacity Management effectively, analyst groups such as Gartner and Forrester, and good practice guidelines such as ITIL® all recommend that it is done from a business perspective.

To achieve this, any supporting tools need to cover all of your infrastructure and provide the capability to bring unique business data into the capacity mix.

athene® has that capability...take a look at our website for more information

http://www.metron-athene.com/index.html

Andrew Smith

Chief Sales & Marketing Officer

Wednesday 21 March 2012

Capacity Management for SAN Attached Storage

Storage Area Networks (SAN) have become ubiquitous in the modern datacenter.

Today’s modern storage environment consists of a complex mix of equipment that interconnects systems to centralized storage. A direct access storage device (DASD) I/O request starts at the host bus adapter (HBA) and extends through a network, and possibly other devices, before finally getting completed at a storage array or intervening cache.

Managing capacity for SAN attached storage is a complex task that typically overlaps several areas of responsibility. SAN administrators are often tasked with allocating SAN storage and maintaining hardware, along with managing capacity and performance.

This approach is good in that storage administrators have intimate knowledge of their storage environments. However, storage administrators typically don’t have time to look at all aspects of capacity management in a proactive fashion, and this creates a reactive mode of operation.

Operating reactively means troubleshooting performance impacts after they happen, allocating storage on short notice, and having their allocated storage be severely under-utilized.

i’ll be discussing ways to assist the storage administrator in the complex task of managing SAN attached storage in our April 5th webinar.

So why not register for your free place now http://www.metron-athene.com/training/webinars/index.html

Dale Feiste

Principal Consultant

Monday 19 March 2012

Consistency Man–the understated superhero (5 of 5)

In the previous blogs in this series we’ve identified cost value analysis, P2V migration of more sophisticated applications, Cloud and managing your virtual environment once live as four of the key trends for the capacity manager to address in 2012.

The fifth and final trend is one that has always been needed, never really gone away and is now perhaps more important than ever because of some of the things discussed earlier in this series.

The capacity manager has to manage ever increasing complexity plus an ever more dynamic environment. Jump back twenty years and it was terminals connected to a mainframe, annual upgrades and an annual capacity plan. Ten years ago it was buy more kit to avoid any risk to service levels and keep everything on its own server – capacity management became capacity reporting and little more. Now we have virtual servers sharing resources, rapid redeployment of applications across the estate, and elements of that estate in the Cloud on top of more solutions for storage, network and security. Where our users used to be sat in an office, now they could be anywhere: different offices; working from home; sat at a ballgame checking that everything is OK with their application on their smartphone.

The greater the scope for confusion and the greater the need for complexity thenthe greater the need for the capacity manager to be Consistency Man, the understated superhero.

Users want a single pane of glass through which to assess their applications, no matter how complex the underlying infrastructure. The capacity manager needs to be able to pull together a picture of capacity across that environment and present a concise, cohesive picture. Service needs to be expressed in user experience and business impact, not technology terms.

Consistency comes through good process implementation. Using a guiding light such as ITIL enables time to be saved by implementing common, repeatable processes across any environment. Starting at a business level to understand what is required from the capacity manager, you can then put together a picture of what services need reporting on. Only then do you need to look at the underlying technology to gather the metrics needed to work your way back to reporting what the business requires. As the technology changes again, as it surely will, having mapped down through the three ITIL capacity management layers: business; service; component, you will have established processes that readily accommodate whatever change is needed to adequately report on the new regime.

Consistency Man will never be as glamorous as the other superheroes, as they fly in to handle the crisis. He will however save you having to depend on their services so much and save your management considerably more money in the long term. A wise boss will appreciate the difference between prevention and cure.

Andrew Smith

Chief Sales & Marketing Officer

Friday 16 March 2012

Sensory overload – the joy of capacity management in a virtual world (4 of 5)

‘It’s all too much!’ the IT man screams.

Certainly our world is getting more and more complicated with virtualization. On the server side we have web servers, database servers, application servers, identity servers and more. We have UNIX, Linux, Windows, and more, J2EE, .NET and so on. A given application or service touches multiple storage systems, network devices, LANs, WANs and VLANs and might be spread across a mix of public and private Clouds. Terminal and host seems like a distant memory for the older among us.

Virtualized workloads might be, indeed should be ,variable across time. P2V probably meant you looked at workload profiles to ensure that virtual machines on a given host had applications that did not have peak processing times that coincided.

All of this variety needs to be managed from a capacity perspective. As ever, we are not managing a fixed entity, indeed entities are much less fixed than in our past. Workloads can be dynamically switched across hosts for performance gains, new hosts are quickly and easily configured and workloads moved around to ensure optimum location. The business keeps changing, old services replaced with new, merger and acquisition, organic business growth or shrinkage. Mobile computing and portable devices add new management challenges and make our users and access points even harder to pin down.

To cope with all this diversity and change, a capacity manager needs to be both more specific and more general. Being specific means trying to at least have a view on all critical transactions. To know which are critical you need to know about them all, so measuring everything automatically from end to end is vital. Old fashioned 80/20 rules still apply – 80% of the work is probably accounted for by 20% of the transactions, so identifying those 20% is critical.

Greater generality perhaps comes from your tool selection. Most organizations have come to the virtualized world with a selection of point tools for different environments and then added further point tools on top for their virtual worlds. Deep dive performance analysis will still require quality point tools in many or all areas. Reporting and spotting trends that affect capacity decisions will be much easier however with a solution included that integrates all the other data sources. Resourcing decisions are moving back into the user domain again with Cloud options. Having a reporting solution that covers every aspect of your environment from a capacity perspective is a significant way in which you can help the users make their resourcing decisions. It also gives you the means of providing good advice and guidance as input to their decision making.

Remember in 2012 the key for capacity managers will be to provide value, not just optimize costs. This value can and will need to be across the entirety of the complex environment in which virtual applications exist. Put together the right combination of end to end perspective, deep dive and ‘across the board’ integrated reporting to ensure you can provide the most value for your business.

We’ll finish on Monday by considering the importance of having the right capacity process in place to underpin your 2012 and onwards capacity management activities.

Andrew Smith

Chief Sales & Marketing Officer

Wednesday 14 March 2012

It’s the capacity manager’s job to stop the budget drifting away on the Clouds (3 of 5)

Talk to anyone and they will tell you that use of Cloud computing will continue to grow through 2012 and beyond. Private and hybrid Cloud now seem to be getting more attention after the initial excitement about public Cloud. As logical extensions of what IT does well – day to day maintenance and support of applications critical to the business – they are understandable. Public Cloud is still gaining in use as well. For example there are now some large businesses trusting to public Cloud for applications such as email.

Whatever the Cloud implementation, there are challenges for the capacity manager. Shared resources are back, although the mainframe guys will tell you they never went away. Let’s call it multi-tenanting so it sounds newer and more exciting. All sorts of capacity will increasingly be shared across multiple groups of users or processes. This brings capacity management activity back to the fore in ensuring that what one person does won’t have an adverse impact on another. Share and share alike, workload profiling, understanding business behaviour patterns and a full view of all resources becomes vital. Sure you can ‘cloud burst’, add in some extra resource in a hurry. Buy anything in a hurry of course, and the seller rubs his hands with glee, knowing he can charge you an inflated price.

All this means there’s lots of scope for the capacity manager to step up and play a much more important, more strategic role in provisioning resources to support the business. The down side is increased complexity to handle. Dynamic allocation of resources means tracking and predicting usage is more complex. Hybrid and public Clouds mean that some of the processing for a given service could be outside your control. Can you measure what is happening?

To meet the new challenges capacity managers will have to enhance their skills and make sure they have the right tools at their disposal. You can only manage what you can measure, so de-stress by making sure your audience knows the constraints on you and do a good job managing what you can reach. Good end to end monitoring that sees every transaction will enable you to at least see where in the processing chain time is being spent. If that is in a public Cloud ‘black hole’ that you can’t reach, at least you will have the evidence that points your Cloud provider to where they need to look. If the time is being spent on parts of the processing chain where you can have a positive impact on provisioning, make sure you have the right drill down tools to assess issues. Implementing a tool or approach that enables you to bring in whatever performance and capacity date is available, irrespective of the source, maximizes the chances that you can plan effectively and manage any capacity problems that arise.

The main thing is not to be put off by having elements of your world outside your direct control. Dr W Edwards Deming is often misquoted as saying ‘you can’t manage what you can’t measure’. What he actually said was that many important things that you need to manage cannot be measured – they must still be managed however. Cloud computing has many important entities that the capacity manager needs to manage. Just because you can’t directly measure some of them doesn’t mean you can’t and shouldn’t include them in your capacity management thinking as best you can.

On Friday we’ll look at applications that are already hosted in virtual environments and how to manage their capacity.

Andrew Smith

Chief Sales & Marketing Officer

Friday 9 March 2012

P2V migration: Completely virtualized? Virtually complete? The fun is yet to start?(2 of 5)

My previous blog talked of the need to look at ‘value’ of systems in the future, not just their ‘cost’. Today we’ll consider the 2012 capacity challenge in P2V, physical to virtual migration.

You probably think this isn’t an issue for capacity managers in 2012. Haven’t we already done P2V? Well, take a look around. There’s still a long way to go. Advances from the likes of VMware mean that more can be virtualized than we might have previously thought. The easy stuff has been done first, now we have the difficult stuff to migrate. For the enthusiastic capacity manager, this means a harder but more rewarding task lies ahead.

The problem of course is that the more difficult the systems are to migrate, the greater the risk and cost to the business if there are any errors. More reason then, to ensure good capacity management throughout the process. ‘Quick calculators’ that use basic math to work out that 50 servers running at 1% will fit onto a virtual server aren’t anywhere near enough to help you assess what can be virtualized and where, if those physical servers are busier, running more complex mixed workloads and have significant interdependencies across an application or service. Better measurement of how those servers are currently performing is needed, combined with a view across multiple servers providing a complete service.

Having quantified and sized requirements for the virtual environment based on a comprehensive view of the physical applications being virtualized, you need to manage the capacity of those applications well, once migrated. A later blog in this series will consider this in more detail, but again the core modern capacity management elements are relevant: predict to avoid problems; ensure you have a 360⁰ view of resources and make sure you know what the business needs.

User access to systems grows ever more varied, especially with the increasing use of mobile computing and portable devices, so make sure you understand your end user service perspective wherever those users are. Rapid deployment offered by virtual systems make over provisioning or poor provisioning so much easier to achieve than before! Good ‘deep dive’ software is needed to ensure that critical virtual servers are well managed to avoid repeating the mistakes of our physical past and wasting money over-deploying resources. As I said in the first blog in this series, capacity management in 2012 is about ‘value’ not ‘cost’.

The rewards for the capacity manager and the business are greater, the bigger the capacity management challenge. Enjoy the 2012 virtualization challenge of virtualizing the valuable stuff!

On Monday we’ll look at our third capacity management trend for 2012, capacity management for cloud computing.

Andrew Smith

Chief Sales & Marketing Officer

Wednesday 7 March 2012

Capacity Management: Top 5 Trends for 2012 - Who Killed Capacity? (1 of 5)

Keep an eye on the Top Ten Capacity Killers

Hi and welcome to the first of my series of five blogs about the top 5 capacity management trends for 2012. Thanks to Frank Days and the guys at Correlsense, our application performance management partner, for the original thinking on this.

We start with the good old Service Level Agreement or SLA. Over-provisioning, green IT initiatives and the economy have put more emphasis for capacity managers on cost and value together, ‘cost value analysis’, not just on budget. It’s not going to be about what something costs any more, it’s going to be about what value it gives to the business. To understand the ‘cost’ part of that equation fully, a good capacity manager will increasingly need to look outside traditional capacity areas and hold a more holistic view of capacity to keep his seniors happy. Costs and service levels across traditional processing areas such as network, processor and disk need to be considered together with environmental factors such as space and power to get a true understanding of the value of an asset.

Adding in SLA considerations enables you to understand the ‘value’ side of the equation. Buying resource to provide a quality of service far in excess of what the business really needs could mean you are providing little value. One can’t make such judgements unless the capacity manager has gone out, found and understood what service level is acceptable to the business. This means an end to end approach to capacity management across an application or service is needed so that nothing is missed.

Of course the ever-increasing complexities of new application environments make this more and more challenging. The ability to reconfigure resources in real time means that traditional sizing approaches such as load testing before an application goes live are no longer sufficient on their own. Once an application is live, an end to end perspective is needed so that you learn what is good and bad from a user perspective, not an infrastructure perspective. Wherever the problem is, you then need quality tools and techniques to analyse and plan in those areas. Some old fashioned rules still apply. The majority of any short term performance issues are probably caused by a very small percentage of transactions. Being able to keep an eye on the ‘Top 10’ bad boy transactions is therefore critical – manage those and chances are your users will be smiling.

On Friday we’ll look at virtualization and capacity management. Thought you had migration of physical to virtual sorted? Think again!

Andrew Smith

Chief Sales & Marketing Officer

Monday 5 March 2012

Using Systems Capacity Data for Business Intelligence

In the modern economy, important business decisions are normally made after analyzing data of some sort.
In large companies, the importance analyzing data for decision-making has created a whole field in IT called BI (business intelligence). Many vendors provide very sophisticated BI software specifically to address this area. Complex data analysis for BI can incorporate many components including, large data warehouse, ETL (extract, transform, and load) tools, OLAP cubes, dashboards and scorecards, etc...

Most of these systems focus on finance, inventory, assets, and other business performance KPIs (key performance indicators). Using system capacity data in a capacity-planning role has long been a part of making intelligent business decisions for expensive hardware purchases, like mainframe hardware.

As low cost distributed systems became an option, many applications have moved onto these platforms, without much regard for capacity planning. The costs for doing capacity planning were seen to be more costly than the equipment, so it was not given priority. After many years of operating this way, many distributed environments have grown unwieldy. Looking at these environments as a whole, they now appear most costly and less efficient.

Using capacity data for making intelligent business decisions has not changed. What has changed is the realization that it still needs to be done.

Looking at the distributed environment as a whole, and determining what areas are under-utilized is now a key area of discussion. Of course ensuring that key resources do not run out of capacity is still the most important goal. However, achieving this dual mandate is not easy with so many moving targets. Having a sound plan of attack, and the proper tools, is critical for success.

We're running a free to attend webinar on March 8th which will cover high-level topics related to constructing a plan of attack, and then examine specific examples for implementation. The end goal is to bring back visibility that was once a precise science on the mainframe, and create usuable business intelligence from the mountain of data now being created. In some shops, even the mainframe has slipped into the cracks of poor capacity management brought on by the distributed deluge. In this case, the mainframe may need to be included in the plan as well.

Join in as we take a top down approach to creating business intelligence using systems capacity data.

Business intelligence
Evolution of capacity data in the systems environment
KPIs
Costs
The plan
Implementation

Register for your free place now http://www.metron-athene.com/training/webinars/index.html

Rich Fronheiser
SVP, Strategic Marketing