Friday, 23 December 2016

The Art and Science of Capacity Management

Just looking at the two parts of the world where I spend most of my time, the USA and Europe, it has been a funny old year.   
In the US, the presidential election has returned a leader who has, to date, sat outside the established political system.  Next year the USA will be led by someone who is totally new to his role, rather than have someone who has come through the traditional route of minor political roles leading to national leadership. 

In Europe, the British referendum concluded in a vote to leave the European Union (EU), or ‘Brexit’ as this process has now become known.  Although a British decision, the ramifications of Brexit will be felt across Europe, something that has already started.  The impact will be potentially more far-reaching once the withdrawal of the UK has been negotiated and eventually happens. 

Where does all this fit with Capacity Management?  Well, both geographies have taken a huge step into the unknown.  America has its first businessman with little conventional political background as leader.  The EU will see the first fully-fledged member state depart from its ranks, which have been steadily expanding to date.  Any step into the unknown like this has risks.  What will happen?  Will people be happy with the decisions they have taken?  Will the changes that come bring more prosperity, more equality or less? How will the all-important financial markets react, with their dislike of uncertainty? How will we cope if what unfolds differs from what we anticipate? 

One of the hardest aspects of Capacity Management is handling questions like these: questions for which there is no precedent, no experience on which judgements can be based or from which measurements can be taken.   For Capacity Managers this manifests itself in being expected to plan for acceptable service levels through major changes such as a new business venture bringing a need to support an application different to any others currently supported or an external change in user behaviour meaning knowledge of past system usage has no value in anticipating future needs. 

This is where the good Capacity Manager really earns his corn.  If we have things we can measure and use as a basis for prediction, if we have similar situations in the past on which we can base judgements, the job is always easier.  It’s more of a science at such times.  

‘Guesstimating’ the totally new is more of an art.  Having a Capacity Manager who has, or has access to, a broad range of experience, perhaps both within his business and in the world outside, will help.  This suggests a level of maturity and experience that needs to be built up over time, a mind that is willing to challenge obvious perceptions and test boundaries others might feel unlikely to be hit.  An open mind, a lateral thinker, someone willing to test assumptions others feel are not worth the time.  If you’re recruiting a Capacity Manager, you might want to think about how you test for these capabilities, rather than focus on traditional areas such as technical know-how.  For Capacity Managers, you might want to get examples of these skills on your CV – an employer who appreciates the need for them might be a better employer for a Capacity Manager than one who does not. 

As for next year – well, no predictions from me.  Ask a good Capacity Manager instead….  

Thank you to everyone who has interacted with Metron throughout 2016. 
My compliments of the season to you all.

Andrew Smith

Thursday, 22 December 2016

Virtualization Oversubscription - What’s so scary? (17 of 20) Expandable Reservation

When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool.  The total reservations of the child VMs may not be more than the Reservation for the Resource Pool.

However if Expandable Reservation is turned on then a Resource Pool may satisfy its Reservation requirements by using the Reservation of another Resource Pool.  This however may stop the 2nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.

When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool.  The total reservations of the child VMs may not be more than the Reservation for the Resource Pool.

I'm off on vacation now and will be back in the New Year to deal with ' What's the worst that can happen?'
Wishing you all a Happy Holidays and see you in 2017!
Phil Bell

Wednesday, 21 December 2016

What’s VITAL from your Capacity Management provider?

It’s not been uncommon over the years for software vendors in the capacity management sphere to evolve.  They might get bought out, sometimes twice in 2 years, get merged into the product line of a larger business, go out of business altogether or change direction and move away from capacity management.  This blog outlines some of the dangers to users of software products from companies who sell out, go out of business or change direction.

Change can be both good and bad for you as a user of their software.  On the positive side, it might mean the range of functionality in your product gets quickly expanded.  You might benefit from becoming part of a larger user community.  The change might mean your future needs are being anticipated by the provider, seeing what you will need from capacity management before you have that need.

The change is not always in your best interests though.  If the software you have come to rely on isn’t strategic for its new owners, it might be ‘sunsetted’, have little or no further development.  Existing products can come to be seen as ‘cash cows’, mature products from which maintenance revenues to the supplier need to be maximised to fund new initiatives – possibly unrelated to your on-going product needs.  Without development and commitment from their supplier, products slowly wither and die, often while maintenance bills increase.  The human side of change should never be underestimated either.  All the support experience you rely on is often let go.  New management might lack the background of those who originally created the solution you use and want to move what elements of your software they retain in a different direction. 

So, if a supplier of a product you are dependent on sells out or parachutes new management in, you might want to consider a few key questions.  I’m biased, I know. 
Metron has retained a consistent focus throughout our 30 year existence.  The points below contrast our approach to changes in businesses I have seen in my 30 years in capacity management.

Do you really want:

A supplier with…

…executives who don’t know your job?

…a marketing budget pushing the latest craze?

…the products you use being ‘sunsetted’?

…sudden large increases in your maintenance bill?

…the people you rely on to support you being let go?

Probably not.  What you might prefer is:

A solution with a future

If the product you’re using isn’t part of the big new marketing push and re-branding, then it’s not going to get the development you need to keep your Capacity Management process successful.

athene® from Metron remains the company’s core solution and primary focus

A solution delivered by experts

If your solution comes from a company whose executive staff has less than two years’ experience in Capacity Management, chances are they won’t understand your needs.

The least experienced of Metron’s executive team has over 10 years knowledge and experience specifically in capacity management.

An expert team to support you

If new people come in and the experienced and skilled staff that you have come to rely on to support you are released, the pool of knowledge available to help you is lessened.

Metron blends a young and creative software development team with design, consulting, support and management staff who each bring between 10 and 30 years of specific capacity management and planning knowledge to the business.

A financial solution you can trust

In an economy that is only growing slowly, it’s painful if your provider tries to use you as a cash cow with up to 60% increases in the maintenance you pay, to help them develop their new strategy.

Metron rewards the loyalty of its clients, with offers for guaranteed future maintenance commitment and reducing maintenance bills for loyal clients

Something in return for the maintenance you pay

If the products you are licensed for are ‘sunsetted’ by your supplier, although they often won’t state this explicitly, they won’t get the on-going investment to evolve with your needs.  Maybe you’re just paying for their change in strategy or their new offices.

Metron continues to reinvest profits in the on-going development of athene®, as we have done for 30 years.

A product strategy, not just the latest fad

So many companies jump on the latest bandwagon chasing what they see as the easy money.  That often leads to a solution that doesn’t meet the day to day problems you face.  You want a capacity management solution focussed on capacity management, not the latest buzzwords.

Metron works closely with its client base to develop athene® to meet their needs as they change, ensuring continuity of product applicability, helping you always retain your focus on the questions you need to answer now.

Innovation that anticipates what you will need

As your needs evolve, you need a capacity management solution ready to help you meet them.

Ever since helping define the ITIL Capacity Management good practice guidelines, Metron has continuously reinvested profits in ensuring athene® stays ahead of the game.  For example, import of business information and analytics to correlate business and technical data for capacity planning has been a feature of athene® for more than 10 years.

Find out why now is a good time to consider a provider who values Capacity Management more than the latest marketing trend.

Metron and our athene® software offer the company focus, strategy, development and product functionality that other capacity management solutions have, plus a few VITAL things that they don’t.

Andrew Smith
Chief Executive Officer

Tuesday, 20 December 2016

Virtualization Oversubscription - What’s so scary? Reservations (16 of 20)

As promised on Monday here’s a quick demonstration of what a reservation does.

When both VMs want the same amount of resource (and have the same shares) they will get an even share of the CPU.  Assuming they both want all of the 4000MHz available they will each get 50% of what they want.
As the Production workload reduces Test will take more and more of the CPU, however Production will always have the rights to use 250MHz CPU.
At the point where Production is using 250MHz CPU Production is in effect getting 100% of the CPU it wants while Test is getting 93.75% of the CPU it wants.  Despite having the same shares values.
Reservations and Shares
If we run the scenario again but this time include the Shares values for the VMs the situation is different.

When they are both trying to use all of the CPU the effect of the shares will come into play and with only 1000 shares Test will get 1333MHz of the 4000MHz available while Production will get 2666MHz or Test gets 33% of what it wants to use and Production gets 66% of what it wants to use. 
As the Production workload decreases this ratio should be maintained until Production gets to its reservation.  At which point Production is in effect getting 100% of the CPU it wants while Test is getting 93.75%. I'll talk more about Reservations on Friday when we take a look at an Expandable Reservation.
In the meantime do sign up to some of our webinars, we're covering some great topics in 2017.

Phil Bell

Monday, 19 December 2016

Virtualization Oversubscription - What’s so scary? Reservations, Shares and Limits (15 of 20)

Reservations, Shares and Limits apply to the amount of CPU and Memory a VM or Resource pool can use.
In the example below we have an Engineering Resource Pool containing 2 Virtual Machines.

Test has 1000  CPU shares and Production has 2000 CPU shares, giving a total of 3000 shares between them.  If there is contention for CPU resource then Production will be given twice as much CPU time as Test.
Also notice the reservation on the Resource Pool has an Expandable Reservation.  This means that if there is another resource pool not using its reservation Engineering could claim and use that reservation if required.  This could cause problems if the 2nd resource pool wishes to use its reservation as it will not be able to push Engineering out.  So while this may provide flexibility its use should be closely monitored.
On Wednesday I'll give a quick demonstration of what a reservation does.
Phil Bell

Friday, 16 December 2016

Virtualization Oversubscription - What’s so scary? VMWare vCPU Co-Scheduling & Ready Time (14 of 20)

Today I’ll explain the effect of what is happening inside the host to schedule the physical CPUs/cores to the vCPUs of the VMs.  Clearly most hosts have more than 4 consecutive threads that can be processed but let’s keep this simple to follow.

·        VMs that are “ready” are moved onto the Threads.
·        There is not enough space for all the vCPUs in all the VMs so some are left behind.  (CPU Utilization = 75%, capacity used = 100%)
·        If a single vCPU VM finishes processing, the spare Threads can now be used to process a 2 vCPU VM. (CPU Utilization = 100%)
·        A 4 vCPU VM needs to process.
·        Even if the 2 single vCPU VMs finish processing, the 4 vCPU VM cannot use the CPU available and while it’s accumulating Ready Time, other single vCPU VMs are able to take advantage of the available Threads
·        Even if we end up in a situation where only a single vCPU is being used, the 4 vCPU VM cannot do any processing. (CPU Utilization = 25%)
As mentioned when we discussed time slicing, improvements have been made in the area of co-scheduling with each release of VMware.  Among other things the time between individual CPUs being scheduled onto the physical CPUs has increased, allowing for greater flexibility in scheduling VMs with large number of vCPUs.  Acceptable performance is seen from larger VMs.

Along with Ready Time, there is also a Co-Stop metric.  Ready Time can be accumulated against any VM.  Co-Stop is specific to VMs with 2 or more vCPUs and relates to the time “stopped” due to Co-Scheduling contention.  E.g. One or more vCPUs has been allocated a physical CPU, but we are stopped waiting on other vCPUs to be scheduled.
Imagine the bottom of a “ready” VM displayed, sliding across to a thread and the top sliding across as other VMs move off the Threads, so the VM is no longer rigid it’s more of an elastic band.  
VMs and Resource Pools can be allocated Reservations, Shares and Limits and I'll be taking a look at these on Monday.
If you haven't already done so don't forget to sign up to get free access to our Resources, there are some great VMware white papers and on-demand webinars on there.
Phil Bell

Wednesday, 14 December 2016

Virtualization Oversubscription - What’s so scary? CPU Oversubscription ( 13 of 20)

CPU Oversubscription

Memory is fairly easy to describe but there are a lot of things going on.  CPU Oversubscription and the technologies involved can be a little more complex to visualize, but there are less tools that the hypervisor has to work with.

·        Time slicing

·        Co-Scheduling

·        Reservations

·        Shares

·        Limits

For a start, time is no longer a constant.  The hypervisor has the ability to run time at whatever speed it likes, just so long as it averages out in the end.

Co-Scheduling is where we have to have all the vCPUs for a single VM, mapped to logical CPUs from the hardware.
Reservations and Shares apply here also and we’ll have more of a look at how they work later.

Limits (also exist for memory), but these can be applied to restrict some VMs down to a smaller amount of CPU than their vCPU allocation would otherwise allow them to have.
Let’s start with Time Slicing.

Time Slicing

In a typical VMware host we have more vCPUs assigned to VMs than we do physical cores. The processing time of the physical cores (or logical CPUs if hyper threading is in play), has to be shared among the vCPUs in the VMs.  The more vCPUs we have, the less time each can be on the core, and therefore the slower time passes for that VM.  To keep the VM in time, extra time interrupts are sent in quick succession when the VM is processing, so time passes slowly and then very fast.

Significant improvements have been made in this area over the releases of VMware. vCPUs can be scheduled onto the hardware a few milliseconds apart but the basic concept remains in place.
Join me again on Friday when I'll look at VMWare vCPU Co-Scheduling & Ready Time.
Phil Bell

Monday, 12 December 2016

Virtualization Oversubscription - What’s so scary? (12 of 20) Memory Idle Tax

One of VMware's memory over subscribe technologies is Memory Idle Tax.

        Memory has Shares - Like reservations, a VM also has an associated number of shares.  The more shares, the more priority it has over the resource if there is contention.

If a virtual machine is not actively using its currently allocated memory, ESX Server charges a memory tax — more for idle memory than for memory that is in use. That is, the idle memory counts more towards the share allocation than memory in use 

        Memory Tax associates a value to each page used

        Default Idle Tax rate is 75%

        This makes idle memory cost 4 times as many shares as active memory

The end result is that VMs holding onto a lot of idle memory, will be more likely to have the balloon driver inflate inside them to try and release some of that idle memory for use by other VMs.
So put simply it's a mechanism to take idle or unused memory from guest VMs that are hogging it in order to give that memory to another VM where it’s more badly needed.
On Wednesday I'll be looking at CPU oversubscription, in the meantime register for our next webinar 'Hardware's a Commodity - Why bother managing capacity?'
Phil Bell

Friday, 9 December 2016

Virtualization Oversubscription - What’s so scary? Reservations (11 of 20)

Reservations are associated with Resource Pools or individual VMs.  Essentially you are setting a value for CPU or Memory that the VM is guaranteed to get.  If the VM doesn’t use all its reservation other VMs can make use of the Memory and CPU.

The fairly obvious caveat is that you cannot have a total list of reservations that are bigger than the hardware, this is illustrated below.

VM7 cannot start as the total of VM1, VM2 and VM7 exceeds the 1000 MHz available.

You can use reservations to ensure that important VMs get the resources they want, so you don’t have to worry about avoiding oversubscription for everything.  Pick the VMs you want to perform their best and give them a reservation that ensures that, then your background VMs can be pushed out of the way if required.
Phil Bell

Wednesday, 7 December 2016

Virtualization Oversubscription - What’s so scary? Memory stats(10 of 20)

Let’s look at some stats for a single Virtual Machine

This is a 4GB VM, but it’s only accessing about 400 MB on a regular basis.  It’s got 2.6GB of memory that’s unique to itself, and 1.4GB that’s shared with other VMs.

So at least one other VM is likely to be sharing at about 1.4GB memory as well.  Given there are a lot of windows VMs in that cluster it’s likely a lot of them have similar amounts of shared memory.  If there are 10 VMs on that host then that’s about 15GB or RAM that you don’t have to have installed or rather, a few more VMs that will fit on the host.

There are also a couple of hours where the balloon driver steals some memory from the VM.  Only about 50MB and given the VMs only accessing 4 to 500MB of RAM, out of the 2.6GB that it’s using, the OS probably just released some cache to satisfy that request.
On Friday we'll be looking at Reservations and how they work.
Don't forget that registration is open for our next webinar 'Hardware's a commodity - Why bother managing capacity?
Phil Bell

Monday, 5 December 2016

Virtualization Oversubscription - What’s so scary? VMkernel Swap (9 of 20)

A reservation is typically set against a resource pool and filters down to give a VM rights against memory.  Essentially if a reservation has been set and applies to this VM, then the VM is guaranteed that amount of memory will be made available in RAM on the EX host.  You can never reserve more memory than exists.  So reservations can ensure good performance for the VMs you care about.  You put the VMs in a resource pool, and allocate a reservation that’s appropriate.  That might be your 1:1 ratio with allocated and reservation.  Then let other less important VMs worry about oversubscription.

When an ESX host is very short of memory it may have to resort to using .vswp swap files for the VM memory.  At this point performance will be affected as data that the OS believes is in memory is, in reality, now on disk.
A VM as default can have up to 65% of its memory used by the balloon driver.  It may also have a memory reservation.  The reservation cannot be swapped or taken up by the balloon driver.  Any memory outside the 65% used by the balloon driver, and the reservation, can be placed into a .vswp file.

In reality you never want this to happen.
On Wednesday I'll be discussing Memory.
Phil Bell

Friday, 2 December 2016

Virtualization Oversubscription - What’s so scary? Memory test (8 of 20)

If memory must be copied to or from disk because there is more requested than can be satisfied, what’s the penalty for doing this?

        Memory vs. disk speed is…?

A) Memory is 100x faster than disk

B) Memory is 1,000x faster than disk

C) Memory is 10,000x faster than disk

D) Memory is 100,000x faster than disk

E) Memory is 1,000,000x faster than disk

F) I have no memory of the event, your honour

A modern disk will respond to an I/O in about 5 milliseconds (5 * 1/1000 of a second).  Access to memory is usually in the order of 50 nanoseconds (50 *1/1,000,000,000 of a second).

That makes disk access a hundred thousand (100,000) times SLOWER than memory access. Tiny numbers like this are difficult to comprehend, so imagine that the memory access time was 1 second.  To write something to disk would then take about 27 ¾ hours to complete.

That’s one good reason for avoiding swapping if at all possible! On Monday I'll be looking at vKernel Swap.
In the meantime why not take a look at some of our VMware on-demand webinars and white papers. Join our Community via our Resources section and get free access.
Phil Bell