Monday 25 March 2013

Who is your Capacity Manager?


The Unsung Hero of IT: The Capacity Manager.  Know them, support them and use them.

Dear CIO,
As you know there are people in our IT department that make sure the company is running smoothly every day.  You’ll be familiar with the people managing the Service Desk team, and the software that they use to make the services you are responsible for efficient.  An Unsung Hero in most IT departments is the Capacity Manager.  If he gets his job wrong, none of us know until it’s too late, and it affects every customer, every employee and ultimately, the bottom line.
Capacity Management and Planning affects every system in the company.  A good Capacity Manager knows the dull and detailed stuff so others don’t have to: how many transactions every service in your company is currently handling; what the transaction times are for each type.  They also know how many transactions the business can handle before performance degrades.  They can tell you exactly what to do to make services as efficient as they can be, to handle whatever the business throws at us in the future.

I call the Capacity Manager an ‘Unsung Hero’ rather than a ‘Superhero’ for good reason.   
The Superhero flies in when chaos already reigns, disaster has already struck, and then stops things getting worse, allowing life to return to normal.  When it goes wrong in IT terms, it’s painful.  Ask these guys: http://www.pcmag.com/article2/0,2817,2373494,00.aspThe Unsung Heroes in capacity management avoid the disaster in the first place.  It might worry the Superhero as it risks putting them out of a job, but it saves a lot of time and money not to have the world collapsing around you. 
It ought to be more reassuring to know there is someone out there who is making sure that all your plans will not be in the news due to “insufficient capacity”, “unexpected demand” and “taking months to get up to speed.”  When the new service goes live, If the Capacity Manager doesn’t do their job in their usual quiet, understated fashion, we’re usually looking at headlines we don’t like to see along with emergency spend 6 to 12 months down the line and some uncomfortable meetings with our stakeholders.
To be the Unsung Hero, the Capacity Manager needs the ‘goods’: good processes in the IT department, good tools and good data. 

Perhaps the most heroic aspect of accomplishment for many of these Unsung Heroes is that they have none of these.  This person, central to the success of your entire department, is often working with a copy of Excel, and a few “cast offs” of tools purchased for other teams to do other jobs and a collection of home grown scripts and ad-hoc processes.  Think how much more he could achieve with the ‘goods’.  Quality tools and established processes might also allow a few more Unsung Heroes to emerge from within your ranks as well, lessening your dependency on that one individual.
It can be hard enough gathering together the inputs to enable successful capacity planning.  Hopefully data about the businesses plans for the year ahead comes from you or the application owners.  Some of it will be collected from testing and existing live environments.  Its value is minimal if the Capacity Manager doesn’t have the tools to process it and save the day, before the day has even started to go bad.

It can be hard to spot the Unsung Hero. 
Putting on a mask and wearing underwear outside their pants isn’t a good look for most Capacity Managers. 
This Unsung Hero should get the same or more recognition as the Superhero when it comes to IT performance and capacity.  Let’s find him before the next disaster http://blogs.thisismoney.co.uk/2011/06/tesco-banks-big-apology-wont-stop-saver-exodus.html .....and make sure he has the processes and tools to ensure our world looks good all the time. 

Phil Bell
Consultant

Friday 8 March 2013

Idle VMs - Why should we care? (3 of 3)


Earlier in the week I looked at the impact idle VM’s can have on CPU utilization and memory overhead today I’m going to look at the amount of Disk or Datastore space usage per Idle VM. 

Each one will have associated VMDK (disk) files.  The files are stored within a Datastore, which in most cases is hosted SAN or NAS storage and shared between the cluster host members.  If VMDKs are provisioned as "Thick Disks" then the provisioned space is locked out within the Datastore for those disks.

To illustrate this an example of a least worst case scenario would be:  100 Windows  idle VMs have been identified across the Virtual Infrastructure and each VM has a "Thick" single VMDK of 20GB used to house the operating system.  This would then equate to 2TB of Datastore space being locked for use by VMs that are idle.  

You can expand this further by, making an assumption  that some if not all VMs are likely to have more disks and of differing sizes.

The simple math will show you how much Datastore space is being wasted.



There is a counter to this, known as Thin Provisioning.  By using Thin disks, in which the provisioned disk size is reserved but not locked you  would not waste the same amount of space as you would by using Thick Disks.  

Using Thin Provisioning also has the added benefit of being able to over allocate disk space thus leading to a reduction in the amount of up front storage capacity required, but only incurring minimal overhead.

Idle VMs -  why you should care.

Identifying Idle VMs, questioning whether they are required, finding out who owns them and  removing them completely will reduce or help eliminate VM sprawl and help to improve the performance and capacity of the Virtual Infrastructure by:

  •         reducing unnecessary timer interrupts
  •          reducing allocated vCPUs
  •          reducing unnecessary CPU and Memory overhead
  •          reduce used Datastore space

 leading to more efficient use of your Virtual Infrastructure, including improved VM to Host ratio and reduction in additional hardware. 

I'll be hosting a two- part webinar series beginning on March 14 with VMware vSphere Performance Management Challenges and Best practices why not register and come along http://www.metron-athene.com/services/training/webinars/index.html

I hope to see you there.


Jamie Baker
Principal Consultant

Wednesday 6 March 2013

Idle VMs - Why should we care? (2 of 3)


In my previous blog I mentioned the term VM Sprawl and this  is where Idle VMs are likely to factor. 

Often VMs are provisioned to support short term projects,  for development/test processes or for applications which have now been decommissioned.  Now idle, they’re left alone, not bothering anyone and therefore not on the Capacity and Performance teams radar.

Which brings us back to the question.  Idle VMs - Why should we care? 

We should care, for a number of reasons but let's start with the impact on CPU utilization.

 When VMs are powered on and running, timer interrupts have to be delivered from the host CPU to the VM.  The total number of timer interrupts being delivered depends on the following factors:

·         VMs running symmetric multiprocessing (SMP), hardware abstraction layers (HALs)/kernels require more timer interrupts than those running Uniprocessor HALs/Kernels.

·         How many virtual CPUs (vCPUs) the VM has.

Delivering many virtual timer interrupts can negatively impact on the performance of the VM and can also increase host CPU consumption.  This can be mitigated however, by reducing the number of vCPUs which reduces the timer interrupts and also the amount of co-scheduling overhead (check CPU Ready Time). 

Then there's the Memory management of Idle VMs.  Each powered on VM incurs Memory Overhead.   The Memory Overhead includes space reserved for the VM frame buffer and various virtualization data structures, such as Shadow Page Tables (using Software Virtualization) or Nested Page Tables (using Hardware Virtualization).  This also depends on the number of vCPUs and the configured memory granted to the VM.
Imagine if you have a large number of idle VM's, as the report below picked up for a client:
 

We’ll have a look at a few more reasons to care on Friday.
Jamie Baker
Principal Consultant

Monday 4 March 2013

Idle VMs - Why should we care? (1 of 3)


The re-emergence of Virtualization technologies, such as VMware, Microsoft's Hyper-V, Xen and Linux KVM has provided organizations with the tools to create new operating system platforms ready to support the services required by the business, in minutes rather than days. 
Indeed IT itself is a service to the business.
In more recent times, Cloud computing which in itself is underpinned by Virtualization, makes use of the functionality provided to satisfy
·         on-demand resources
·         the ability to provision faster
·         rapid elasticity (refer to NIST 's description of Cloud Computing)
Cloud computing makes full use of the underlying clustered hardware. Constant strides are being made by Virtualization vendors to improve the Virtual Machine (VM) to Host ratio, without affecting the underlying performance. 
But, you may ask "What's this got to do with Idle VMs?"
Well, as I described earlier Virtualization provides the means to easily and quickly provision virtual systems. Your CTO/CIO is going to demand a significant ROI once  an investment in both the hardware and virtualization software has been made, possibly switching the focus to an increase in the VM to Host ratio. 
“What's wrong with that?” I hear you say.  Nothing at all, as long as you keep track of
·         what VMs you are provisioning
·         what resources you have granted
·         what they are for
 
Failure to do so will mean that your quest for a good ROI and a satisfied Chief will be in jeopardy, as you’ll encounter a term most commonly known as VM Sprawl.
More about this on Wednesday
 
Jamie Baker
Principal Consultant