Metron - Capacity Management: June 2012

Friday 22 June 2012

Virtualization changes the Storage Capacity Planning game

An excellent article on storage capacity planning from Manek Dubash via TechTarget (www.techtarget.com) crossed my desk recently. It triggered a few thoughts about Metron and our approach to capacity planning, whether that be for servers, storage, networks or whatever.

The gist of the message was that storage demands are increasing rapidly and that virtualization complicates the picture. Where a physical server has physical limits to things such as storage, in theory a virtual server has none of these limitations. What’s more, the ability to scale up so quickly with virtual technology makes planning harder.

The need for speed in planning tends towards using as simple a route as possible. Often this is just not appropriate though. It will suit a virtualization Sys Admin to take a view of the average requirements (e.g. memory, storage, CPU) per additional VM. This can cover a multitude of very different workloads in those VMs however. This average approach could lead to the same over-provisioning as we saw so many organizations achieve when they used ‘standard’ server builds in the early days of distributed systems. Ah, the joy of virtualization: this time we will be able to over-provision so much more quickly.

To counter this, the storage admin or capacity planner needs to put a greater degree of definition into future storage needs. This means profiling different workloads and using planning techniques to ‘mix and match’ what might happen with the business; 1,000 more VMs might actually be 200 email, 100 database, 100 application, and 600 web server, each type having very different storage requirements. Being able to do this profiling means capturing and analyzing relevant data over longer periods of time – there is simply no way around this. Without it, you run the risk of under- or over-provisioning and the attendant performance crises or out of control hardware spend.

There are lots of steps you can take to help implement efficient storage systems, for example tiered storage, de-duping data or thin provisioning. The same dangers will still exist and those technologies only mitigate or potentially delay the day of reckoning. For example, thin provisioning might avoid over-provisioning of storage upfront, but it also increases the need to manage real storage more effectively. Thin provisioning makes capacity planning of real storage more important than ever by moving the responsibility elsewhere.

All of this supports Metron’s 360⁰ view of Capacity Management, supported by good practice guidelines such as those provided by ITIL®.

You need to split the ‘capacity planning’ tasks of sys admin and capacity management. Each can feed and support the other, but they have different perspectives and address different questions. Longer term capacity management of storage, or CPU, or memory or network, needs a good database of quality data and a level of analysis that short term day to day capacity decisions cannot make use of.

Just doing capacity planning within one group, whichever you choose, will cost you money sooner or later.

Andrew Smith

Chief Sales & Marketing Officer

Manek’s full article is available at http://searchstorage.techtarget.co.uk/feature/Storage-capacity-planning-tools-Why-virtual-machines-change-everything.

Monday 18 June 2012

Top 5 Do’s for VMware

I’ve put together a quick list of the Top 5 Do’s and Don’ts for VMware which I hope you’ll find useful.

Today I’m starting with the Top 5 Do’s

DO

1) Select the correct operating system when creating your Virtual Machine. Why? The operating system type determines the optimal monitor mode to use, the optimal devices, such as the SCSI controller and the network adapter to use. It also specifies the correct version of VMware Tools to install.

2) Install VMware Tools on your Virtual Machine. Why? VMware Tools installs the Balloon Driver (vmmemctl.sys) which is used for virtual memory reclamation when an ESX host becomes imbalanced on memory usage, alongside optimized drivers and can enable Guest to Host Clock Synchronization to prevent Guest clock drift (Windows Only).

3) Keep vSwp files in their default location (with VM Disk files). Why? vSwp files are used to support overcommitted guest virtual memory on an ESX host. When a virtual machine is created, the vSwp file is created and its size is set to the amount of Granted Memory given to the virtual machine. Within a clustered environment, the files should be located within the shared VMFS datastore located on a FC SAN/iSCSI NAS. This is because of vMotion and the ability to migrate VM Worlds between hosts. If the vSwp files were stored on a local (ESX) datastore, when the associated guest is vMotioned to another host the corresponding vSwp file has to be copied to that host and can impact performance.

4) Disable any unused Guest CD or USB devices. Why? Because CPU cycles are being used to maintain these connections and you are effectively wasting these resources.

5) Select a guest operating system that uses fewer “ticks”. Why? To keep time, most operating system count periodic timer interrupts or “ticks”. Counting these ticks can be a real-time issue as ticks may not always be delivered on time or if a tick is lost, time falls behind. If this happens, ticks are backlogged and then the system delivers ticks faster to catch up. However, you can mitigate these issues by using guest operating systems which use fewer ticks. Windows (66Hz to 100Hz) or Linux (250Hz). It is also recommended to use NTP for Guest to Host Clock Synchronization, KB1006427.

On Wednesday I’ll go through the Top 5 Don’ts.

If you want more detailed information on performance and capacity management of VMware why not visit our website and sign up to be part of our community? Being a community member provides you with free access to our library of white papers and podcasts. http://www.metron-athene.com/_downloads/index.html

Jamie Baker

Principal Consultant

Thursday 14 June 2012

Capacity Management – How do we tell the story?

As Capacity Planners, we are tasked with providing the information needed to allow the business to make decisions on necessary IT changes for the enterprise. There are times where this information is provided in ad-hoc ways and other times where it needs to be a formal document and presentation.

There are different components to providing these two styles of advice but the challenge is the same, how do we tell the Capacity Management story?

Capacity Planners have to present to different audiences and whilst the information provided can usually be found in the same reports, the way of relaying this information to each audience varies. The way a capacity planner presents to individuals at the “C” level is different to that presented at the director and management level. The “C” level audience want you to get to the bottom line as quickly as possible whereas the director and manager levels are more interested in detail. For the ‘C’ level, it’s about ‘when’ it will be a problem and how you plan to avoid it, other levels need to know ‘why’ it will hurt, to assess if ‘how’ you plan to avoid it will work.

So what kind of things should go in to the production of a regular capacity report? How do we go about determining which Key Performance Indicators and business metrics should be in there?

What kind of reports can be shown to different levels within an enterprise and how can we make these appropriate for the different audiences?

These are all questions facing every Capacity Planner. Aside from the focus of exactly what the story is that we are attempting to tell, there are a myriad of possible options for presenting our advice. So how do we decide which is the best for our audience?

Our years of experience have taught us a few tips and techniques for presenting the data in an appropriate manner and shown us how we can automate the process as much as possible.

We’re happy to share this knowledge with you but it’s way too large a subject to deal with in just one blog. By popular demand therefore I’m running a free to attend webinar today which provides guidance and answers to these questions and more.

Charles Johnson

Principal Consultant

Tuesday 12 June 2012

The Top 5 reasons for looking at server and storage capacity together

1. Poor disk I/O performance is a leading cause of application response problems.

Many times application response problems go undetected when I/O performance issues are intermittent. This scenario is made common by the highly shared and mixed back end storage environments that are pervasive these days. Applying basic capacity management methodologies to the storage environment can help identify and prevent these type of problems in advance.

2. Having historical data available for key storage metrics is critical to avoiding incidents.

Effective capacity management of storage requires having access to historical data. Capturing, collecting, and managing historical the data is a pre-requisite for analyzing it. Tools like athene®, that are designed for this purpose, make the job easy.

3. Knowing when to take action before an incident occurs can eliminate firefighting.

Automatic trending with alarms gives administrators advanced notice of impending problems. Scorecards can be used to make identifying alarms more manageable. Having software available to help with configuring this functionality makes the job easy.

4. Managing capacity in a large storage environment can be very difficult and having the right tools is critical.

The amount of data that can be captured for storage related components can be overwhelming. Determining what data is most important and how frequently is should be collected it is a good first step. Configuring automated analysis and alarming of the data is the next step. As your capacity management operation for storage scales up, having the right software tools to help becomes critical.

5. Knowing how efficiently your storage environment is being used provides transparency.

Having visibility into a complex storage environment, from a capacity management perspective, is critical to knowing how efficiently storage is being used. If management asks where are the high storage costs coming from? This question can get answered accurately by using capacity management data.

If you’ve got any questions on the benefits of capacity managing servers and storage together feel free to ask.

Dale Feiste

Consultant

Friday 8 June 2012

Capacity Management?..we’re already doing it

Most organizations will rightly claim they are performing capacity management already. Decisions about capacity are taking place, whether by guesswork, advice from outside or based on specialist products such as athene®. Keeping pace with all the considerations gets harder and harder however, as IT infrastructure becomes ever more complex.

Where in the past we had one mainframe for which to plan, and that only had a change in capacity once per year, IT infrastructure is now more diverse. Through the proliferation of distributed servers to the ever-growing mass of load balancers, proxy and firewall servers, our environment gets more complex and difficult to manage.

New architectures bring both new threats as well as new opportunities. Ease of implementation means that virtual servers can be rolled out even more easily than distributed servers., Cloud brings greater flexibility of addition or removal of resource.

All this increase in complexity means it gets ever more challenging to keep up with aligning capacity plans with an organization’s strategic business direction. The risk is that we keep making technical infrastructure decisions because we ‘can’ do it this way rather than because we ‘should’ do it this way.

One answer to growing diversity in the environment you have to manage is to reduce the diversity of tools with which you manage it. For some time now Gartner have been recommending that large corporations adopt a single tool for managing capacity, rather than the mix and match selection of point tools often collected over time.

Project driven development has often meant point tools have been selected for managing capacity for each new technology as implemented. Cloud and the need for a more strategic view of capacity across all areas of infrastructure provide the ideal time to re-consider a fragmented product policy and introduce a streamlined solution across the enterprise.

With its broad data capture capability plus market-leading functionality to incorporate data from any source, athene® provides this. Immediate savings on cancelled software licenses and on-going savings through reduced time to achieve capacity objectives provide you with a swift return on investment.

Take a look at our athene® software and services………….

http://www.metron-athene.com/index.html

Andrew Smith

Chief Sales & Marketing Officer

Wednesday 6 June 2012

What should we do within IT Service Management?

Implementing practical, effective capacity management is a vital step.

The first step in capacity management is confined to resource management at the component level such as CPU, memory and storage. It will maintain a constant service level with suitable resources, but it is a passive job in that we only provide capacity to achieve all IT services at the same performance.

In the second step, service level is prioritized depending on the importance of the business. Then the capacity of the IT resources is optimized according to that conclusion.

However, all capacity management efforts are still only focusing on how to keep the Service Level Agreement. No one takes care of how the IT services being provided help its users or contribute to LOB results.

In the third step, how the IT services being provided help its users or contribute to LOB results should be addressed. We have to measure how much the IT services being provided contribute to the result of the LOB and it should be reported not only to the LOB but also to the employer making management decisions.

In this way, capacity management can contribute to the business itself as well as the IT infrastructure.

The heretical view of ‘Knot-ITIL’ can be repeated in brief here. It focuses on the six core processes and tries to draw attention to the key one. Do it.

Deliver IT (do it)

Address Bugs as they arise.

Make Changes to correct as needed.

Identify Assets to be used.

Exploit Finance to control it all.

Do it Efficiently to do it well (including availability, capacity…).

Short, sweet and easy-to-remember

It is clear that employers may have lost faith in IT…it can be restored by capacity management.

Adam Grummitt

Distinguished Engineer and Author ‘Capacity Management – A Practitioner Guide’

http://www.itgovernance.co.uk/products/3077

A selection of more white papers are available for free download http://www.metron-athene.com/_downloads/index.html

Monday 4 June 2012

What is Capacity?

That should be an easy enough question to answer. Yet I suspect no matter how I define it, you or one of my colleagues will define it another way.

To me capacity is the ability to do “x” amount of work. That’s fine but…what is work? There is no standard “work” unit that means anything to every business. This leads to capacity being measured in MHz or MB. But then that tells the business (who don’t know if a MHz is a new games console, or what you get when you use a nail gun on your knee), next to nothing. Somehow we need to get from the resources we can measure, to a description that the business can understand.

To be frank, that’s not always made easy by either side.

ITIL describes three layers of Capacity Management: Resource, Service and Business. That’s not a bad place to start. We might even think of that as MHz, Transactions, Customers.

Measuring the resource utilisation, and trending it in to the future can be done with the most basic of tool sets. So resource measurement can be purchased off the shelf. Now, how do we turn that into Service/Transactions? This is where we step away from the resources and start to monitor the applications. And boy do some application vendors make that hard work!

At this point no doubt someone reading this is thinking that their database reports transactions in a nice clear easy format. Yes it does. But those are database transactions. What we are looking for here are more “Service” transactions, e.g. I transfer money between my accounts while logged in to my bank’s website. To do this I’ll have interfaced with several services, and generated tens of database transactions before I even get to transferring the money. Yet as a customer I will feel I’ve only done two things. Logged on, and transferred cash.

It can be hard identifying these transactions in a complex environment where one transaction may be spread across multiple servers and databases.

It’s not impossible, and with a little effort these types of transactions can be recorded and plotted against resource utilisations of the relevant servers. However it relies on accurate service maps and someone maintaining the relationship.

This is where our new partner Correlsense and their Sharepath software come in to play. Sharepath identifies each user interaction (without any code changes to the applications), and automatically records where the business transaction generates work in the environment. Metron’s Athene software will then provide the resource utilisations and automatically generate reports combining both sets of data - reports that contain information the business can understand. Now we are talking about the maximum number of logons, money transfers, or movie downloads, or shopping carts that can be processed, and if the trend says they’ll hit that number, it’ll tell them when.

What’s your definition of capacity?

http://www.metron-athene.com/documents/factsheets/published_paper/correlsense_and_metron_transaction-based_capacity_planning.pdf

Phil Bell

Pre-sales Consultant