Metron - Capacity Management: November 2016

Wednesday, 30 November 2016

Virtualization Oversubscription - What’s so scary? Balloon Driver (7 of 20)

Today we'll take a look at the Balloon Driver (vmmemctl)

The Problem

A process in VM1 is shut down and its memory is freed in the OS.

The “hardware” does not know. The data is still there but only the OS inside the VM knows it can overwrite it.

The VMware Solution

When memory gets tight on an ESX host, the VM kernel will pick a VM (based on shares), and tell the balloon driver to request some memory.

The balloon driver requests memory and “pins” it so it cannot be paged. The Memory on the ESX is then freed up and can be allocated to another system.

On Friday I'll be talking about Memory Test.

There's less than a week to go until our 'VMware Capacity and Performance Essentials' online workshop starts, still time to book your place.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell
Consultant

Monday, 28 November 2016

Virtualization Oversubscription - What’s so scary? Memory Oversubscription (6 of 20)

As stated in my previous blog having set out a few ground rules we can now look at memory oversubscription.

So how can we over-subscribe memory?

Free Space - Just like disks have a lot of free space, servers typically run with free space in memory.

Then there are a number of tricks that the hypervisor can do to find even more savings.

· Page Sharing (Deduplication)

· Balloon Driver (VMware)

· Reservations

· Shares

We'll start with a look at:

Transparent Page Sharing

When two or more Virtual Machines have the same pages of data in memory, VMware can store a single copy and present it to all the VMs. Should a VM alter a shared memory page, a copy will be created by VMware and presented to that VM.

An Example is shown in the graphic below:

VM1 starts and allocates some unique memory.

VM2 starts and allocates some unique memory.

VM1 allocates memory for a standard windows dll.

VM2 also allocates memory for the same standard windows dll.

VMware maps both systems memory to the same page in RAM.

On Wednesday I'll explain more about the VMware Balloon Driver.

In the meantime register for access to some of our great Resources, white papers, on-demand webinars and more.

http://www.metron-athene.com/_resources/login.asp

Phil Bell

Consultant

Friday, 25 November 2016

Virtualization Oversubscription - What’s so scary? VMware CPU and Memory Maximums (5 of 20)

CPU and Memory are the main items people consider for Virtualized systems, so let’s lay down the maximums.

• Virtual Machine Maximum

– 128 vCPUs per VM

• Host CPU maximums

– Logical CPUs per host 480 (Logical CPUs being simultaneous threads so that might be 240 hyper-threaded cores.)

– Virtual machines per host 1024

– Virtual CPUs per host 4096

– Virtual CPUs per core 32

• Maximum with a caveat: The achievable number of vCPUs per core depends on the workload and specifics of the hardware. For more information, see the link below for the latest version of Performance Best Practices for VMware vSphere

https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-configuration-maximums.pdf

This raises 2 points.

· Clearly it’s ok to oversubscribe CPUs

· There is no set number to tell you how much oversubscription is OK.

Memory VMware Maximums

Memory is a lot simpler.

• 6TB per Host

– Well 12TB on specific hardware

• 4TB per VM

Having set out those few ground rules we can now look at memory oversubscription and I'll be doing just that on Monday.

There are still a few places left on our VMware vSphere Capacity & Performance Essentials online workshop so don't forget to book your place.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell

Consultant

Wednesday, 23 November 2016

Virtualization Oversubscription (What’s so scary?) - What can be oversubscribed? (4 of 20)

Today I'll deal with what can be oversubscribed.

What can be oversubscribed?

• CPUs

• Memory

• Disk

• NICs

In our virtual world, now that we have broken the link between the OS and the hardware, we can over provision all sorts of things.

CPU, Memory, Disk (as we mentioned) and NICs are all “Oversubscribed”.

Disk we have already looked at, Memory and CPU I’ll go into in more detail on later but I thought it was worth mentioning NICs here.

Typically people seem to be running with 10 - 15 VMs on a single host which will have significantly fewer NICs installed. A Server typically wouldn’t use all the bandwidth of its NIC so that unused bandwidth is like the unused space on disk.

When the VMs talk to other VMs on the same Host that’s not generating traffic though the physical NICs, so we might consider that as the equivalent of de-duplication.

In the next few blogs I'll be looking at CPU and Memory.

Phil Bell

Consultant

Monday, 21 November 2016

Virtualization Oversubscription - What’s so scary? Oversubscription (3 of 20)

So what is oversubscription? In its simplest terms it’s allocating more than you have.

Thin Provisioning

The most obvious example of Thin provisioning happens in storage. It’s been around for a long time and is the same thing simply known by another name.

With storage you have the LUNs that are allocated. Now traditionally these would have been a physical allocation on disk that was available for use but with Thin Provisioning you can allocate more space to LUNs than you actually have. The reason being, that most disks on servers are not full. So if the average disk is 30% full, you could get away with only having 50% of your allocated storage as real usable space that exists, and you’d still have plenty of space to grow into.

On top of that, some storage systems will do their own deduplication.

Deduplication & Compression

Imagine you have 200 Windows 2012 servers, all with a C drive that just has the OS on it. That’s about 12 GB per server storing the same base OS files or 2.4TB of space. Now those OS disks need space for things like memory dumps, updates and log files etc, which is the unused space but do you want to spend 2.4TB of storage storing the same 12GB of files 200 times? Probably not.

You’d prefer to store a single copy of all the identical files and let them all access that single copy. So you’re not just ignoring some of the unused space, you’re able to store less as well.

So, 200, 32 GB drives (Minimum Windows 2012 requirement), would be 6.4TB. It’s now theoretically come down to something like 20GB used space with thin provisioning and deduplication.

On Wednesday I'll deal with what can be over-subscribed.

In the meantime our VMware online workshop is coming round fast so make sure you book your place

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell
Consultant

Friday, 18 November 2016

Virtualization Oversubscription - What’s so scary? (2 of 20) Fear and Misunderstanding

Carrying on from Wednesday I mentioned that ultimately there is a fear in these departments that oversubscription = poor performance. It’s considered to be a 1:1 relationship. The reason for this is, to some extent, a misunderstanding of what oversubscription is. It’s got the word ‘over’ in it then it must be bad. Nothing in our department is ‘over’. We’re all looking at the same word, they see something bad and I see an opportunity to save some money. Correct me if you think I’m wrong, but saving money is typically thought to be a good thing.

Flying Navigation by Dead Reckoning

Avoiding oversubscription is a bit like navigating by dead reckoning.

You know what you started with
You know what you provisioned
You know how much is left

In WW2 a bomb was considered to be on target if it was within 5 miles of the actual target and we only managed that with 1 in 5. Dead reckoning isn’t very accurate on its own. The situation is much more complex than that and the same remains true for people avoiding oversubscription.

Virtualization Used Capacity by Dead Reckoning

• You know what you started with

• You know what you provisioned

• You know how much is left

This is not especially efficient. The essence of what these sites seem to be doing is this:

We start with a 5 Host cluster that has 120 Logical CPUs, and 180GB RAM. We’re then going to issue no more than 96 vCPUs and 144GB RAM across the VMs. This allows for a host to fail and we can still run everything. We’ll also have great performance because VMs will get a CPU whenever they want it, because it’s theirs, and the same with memory. All the memory a VM wants is real RAM.

I’m not going to deny that performance will be about as good as it can be, but it’s not going to be terribly efficient. Chances are you could turn off 2 hosts and still see no impact in performance. Who wouldn't like to reduce their ESX licence, and related power costs by 20%, while still having a spare host?

So what is oversubscription? I'll deal with this on Monday.

Phil Bell

Consultant

Wednesday, 16 November 2016

Virtualization Oversubscription - What’s so scary? (1 of 20)

Overcommit vs Oversubscribe

When I wrote the Title and Synopsis for this blog series I happened to choose the word oversubscription. It’s been pointed out to me that many people refer to “overcommit” rather than “oversubscribe”.

Both words appear to be in use to describe this and Google does some nice work to return results using either option.

What led me here?

In my role with Metron I get to visit lots of different people working in lots of different industries. They are mostly capacity managers working in IT departments though, so there is some common ground.

Clients

– “Oh, we don’t oversubscribe”

Over the past year I’ve had a number of mildly frustrating conversations with organisations. These tend to be newer and/or don’t have a good history of Capacity Management. The frustration has been around ‘Oversubscription’ in virtualised environments.

I’ll start talking about how we can monitor the environment to ensure good performance in the future. When they’ll stop me and say, quite proudly, “Oh, we don’t oversubscribe, we don’t want to impact performance”. The pride is almost the worst part. They’ll beam a smile at the senior staff as if to say “We’ve got this, don’t worry”. They aren’t concerned about the overspend that’s required to have that attitude.

Ultimately there is a fear in these departments that oversubscription = poor performance. I'll be looking at the fear and misunderstanding around this on Friday.

In the meantime there are still a few places available at our VMware Capacity and Performance essentials online workshop, don't forget to book your place now.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell

Consultant

Monday, 14 November 2016

Idle VMs - Why should we care? (3 of 3)

At the end of last week I looked at the impact idle VM’s can have on CPU utilization and memory overhead today I’m going to look at the amount of Disk or Datastore space usage per Idle VM.

Each one will have associated VMDK (disk) files. The files are stored within a Datastore, which in most cases is hosted SAN or NAS storage and shared between the cluster host members. If VMDKs are provisioned as "Thick Disks" then the provisioned space is locked out within the Datastore for those disks.

To illustrate this an example of a least worst case scenario would be: 100 Windows idle VMs have been identified across the Virtual Infrastructure and each VM has a "Thick" single VMDK of 20GB used to house the operating system. This would then equate to 2TB of Datastore space being locked for use by VMs that are idle. You can expand this further by, making an assumption that some if not all VMs are likely to have more disks and of differing sizes.

The simple math will show you how much Datastore space is being wasted.

There is a counter to this, known as Thin Provisioning. By using Thin disks, in which the provisioned disk size is reserved but not locked you would not waste the same amount of space as you would by using Thick Disks. Using Thin Provisioning also has the added benefit of being able to over allocate disk space thus leading to a reduction in the amount of up front storage capacity required, but only incurring minimal overhead.

Idle VMs - why you should care.

Identifying Idle VMs, questioning whether they are required, finding out who owns them and removing them completely will reduce or help eliminate VM sprawl and help to improve the performance and capacity of the Virtual Infrastructure by:

· reducing unnecessary timer interrupts

· reducing allocated vCPUs

· reducing unnecessary CPU and Memory overhead

· reduce used Datastore space

· lead to more efficient use of your Virtual Infrastructure, including improved VM to Host ratios and reduction in additional hardware.

Don't forget to sign up for our VMware vSphere Capacity and Performance Essentials online workshop commencing on Dec 6th.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Jamie Baker

Principal Consultant

Friday, 11 November 2016

Idle VMs - Why should we care? (2 of 3)

In my previous blog I mentioned the term VM Sprawl and this is where Idle VMs are likely to factor.

Often VMs are provisioned to support short term projects, for development/test processes or for applications which have now been decommissioned. Now idle, they’re left alone, not bothering anyone and therefore not on the Capacity and Performance teams radar.

Which brings us back to the question. Idle VMs - Why should we care?

We should care, for a number of reasons but let's start with the impact on CPU utilization.

When VMs are powered on and running, timer interrupts have to be delivered from the host CPU to the VM. The total number of timer interrupts being delivered depends on the following factors:

· VMs running symmetric multiprocessing (SMP), hardware abstraction layers (HALs)/kernels require more timer interrupts than those running Uniprocessor HALs/Kernels.

· How many virtual CPUs (vCPUs) the VM has.

Delivering many virtual timer interrupts can negatively impact on the performance of the VM and can also increase host CPU consumption. This can be mitigated however, by reducing the number of vCPUs which reduces the timer interrupts and also the amount of co-scheduling overhead (check CPU Ready Time).

Then there's the Memory management of Idle VMs. Each powered on VM incurs Memory Overhead. The Memory Overhead includes space reserved for the VM frame buffer and various virtualization data structures, such as Shadow Page Tables (using Software Virtualization) or Nested Page Tables (using Hardware Virtualization). This also depends on the number of vCPUs and the configured memory granted to the VM.

We’ll have a look at a few more reasons to care on Monday, in the meantime why not complete our Capacity Management Maturity Survey and find out where you fall on the maturity scale. http://www.metron-athene.com/_capacity-management-maturity-survey/survey.asp

Jamie Baker

Principal Consultant

Wednesday, 9 November 2016

Idle VM's - Why should we care? (1 of 3)

The re-emergence of Virtualization technologies, such as VMware, Microsoft's Hyper-V, Xen and Linux KVM has provided organizations with the tools to create new operating system platforms ready to support the services required by the business, in minutes rather than days.

Indeed IT itself is a service to the business.

In more recent times, Cloud computing which in itself is underpinned by Virtualization, makes use of the functionality provided to satisfy :

on-demand resources
the ability to provision faster
rapid elasticity (refer to NIST 's description of Cloud Computing)

Cloud computing makes full use of the underlying clustered hardware. Constant strides are being made by Virtualization vendors to improve the Virtual Machine (VM) to Host ratio, without affecting the underlying performance.

But, you may ask "What's this got to do with Idle VMs?"

Well, as I described earlier Virtualization provides the means to easily and quickly provision virtual systems. Your CTO/CIO is going to demand a significant ROI once an investment in both the hardware and virtualization software has been made, possibly switching the focus to an increase in the VM to Host ratio.

“What's wrong with that?” I hear you say. Nothing at all, as long as you keep track of what VMs you are provisioning and :

what resources you have granted
what they are for

Failure to do so will mean that your quest for a good ROI and a satisfied Chief will be in jeopardy, as you’ll encounter a term most commonly known as VM Sprawl.

More about this on Friday.

In the meantime why not register for our webinar VMware Cluster Planning'

http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Jamie Baker

Principal Consultant