Metron - Capacity Management: VMware

Showing posts with label VMware. Show all posts

Wednesday, 5 July 2017

Understanding VMware Capacity - Why OS monitoring can be misleading, Time Slicing (2 of 10)

Following on from Monday's blog, the effect we saw between the OS and VMware is caused by time slicing. In a typical VMware host we have more vCPUs assigned to VMs than we do physical cores. A situation known as over-provisioning, and to some extent the original purpose of virtualization.

The processing time of the physical cores has to be shared among the vCPUs in the VMs. The more vCPUs we have the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time extra time interrupts are sent in quick succession. So time passes slowly and then very fast.

Time is no longer a constant, but the OS doesn’t know that. So the safest approach is to avoid using anything from the OS that involves an element of time.

Significant improvements have been made in this area over the releases of VMware. VMware tools has a number of tricks to try and make the OS metrics as close as possible, as well as improved co-scheduling of CPUs. But the basic concept remains in place. Later I will discuss how it can be ok to use averages and estimates for reporting on the future, when we have the choice of accurate data from VMware, or less accurate data from the OS. I would suggest taking accuracy where we can easily do so, has to be the better option.

On Friday I'll be looking at the 5 key VMware metrics to monitor, in the meantime take a look at the great selection of white papers and on-demand webinars on VMware in our Resources section.

https://www.metron-athene.com/resources/index.asp

Phil Bell

Consultant

Monday, 3 July 2017

Understanding VMware Capacity - Why OS monitoring can be misleading (1 of 10)

Dangers with OS Metrics

Almost every time we discuss data capture for VMware, we’ll be asked by someone if we can capture the utilization of specific VMs, by monitoring the OS. The simple answer is no.

The more complex answer is that we can capture the data from the OS, but it may not be reliable. So here’s an example of why.

We have 2 VMs. Within the 1 second interval we are looking at, one of the VMs was only allocated the CPU for ½ a second. In that ½ second the VM used 50% of it’s possible CPU time. So from the OS perspective it was running at 50% CPU utilization. If we look at data from VMware, we’ll see that VMware knows the VM only used ½ the CPU available in ½ a second. Or 25%.

The 2nd VM was running on CPU for the entire second. And again it used 50% of it’s possible CPU. So, to the OS, it appears it was running at 50% CPU utilization, and VMware has the same result.

The more contention there is for CPU time, the more time VMs will spend Dormant/Idle, and the further apart the values will be. This effect means that any metrics which have an element of time in their calculation cannot be relied upon to be accurate.

Here is data from a real VM

The (top) dark blue line is the data captured from the OS, and the (Bottom) light blue line is the data from VMware. There clearly is some correlation between the two. At the start of the chart there is about a 1.5% CPU difference. Given we’re only running at about 4.5% CPU that is an overestimation by the OS of about 35%. But at about 09:00 the difference is ~0.5% so the difference doesn’t remain stable either.

Historically it’s not been unusual to see situations where the OS metric is reporting 70% CPU utilization and VMware is reporting 30%.

More on Wednesday, in the meantime don't forget to register for our next webinar 'Top 5 VMware tips for performance and capacity'

https://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Phil Bell

Consultant

Friday, 13 January 2017

Virtualization Capacity Management - making sure server sprawl doesn't become virtual server sprawl

Surely there’s no need for virtualization capacity management after all didn't your move to virtualization correct all that?

Possibly not - Gartner has been warning of 'virtual server sprawl' for some time.
When planning for the future, the business should be the starting point.

Virtualization capacity management brings together the technology (what resources we need and when we need them) with the business (what we need to achieve and what it should cost).

Moving resources to where they can support requirements and similar activities will all be needed on a day to day basis.
(My video below talking about Virtualization Capacity Management)

Expected savings not as good as anticipated when virtualizing?

For some organizations there is already some disillusionment with virtualization. Capacity management offers the tools and the processes to get the maximum benefit from virtualization.

Virtualization has rapidly become too important to business success to leave to chance. As a minimum any organization needs a clear window onto what is happening on a day by day basis. Where the business is today governs day to day decisions you are taking. Capacity management is the link between the technical data you have and the knowledge of what direction the business wishes to take in the future.

Capacity decisions need to be made at a strategic level and must be made from a 'real' business perspective. What the business needs to achieve needs to be mapped onto the best technological option available to achieve it.

Ask how we can help your Organization to realize savings

http://www.metron-athene.com/contact/index.html#contactme

Jamie Baker

Principal Consultant

Thursday, 22 December 2016

Virtualization Oversubscription - What’s so scary? (17 of 20) Expandable Reservation

When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool. The total reservations of the child VMs may not be more than the Reservation for the Resource Pool.

However if Expandable Reservation is turned on then a Resource Pool may satisfy its Reservation requirements by using the Reservation of another Resource Pool. This however may stop the 2^nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.

I'm off on vacation now and will be back in the New Year to deal with ' What's the worst that can happen?'

Wishing you all a Happy Holidays and see you in 2017!

Phil Bell

Consultant

Wednesday, 14 December 2016

Virtualization Oversubscription - What’s so scary? CPU Oversubscription ( 13 of 20)

CPU Oversubscription

Memory is fairly easy to describe but there are a lot of things going on. CPU Oversubscription and the technologies involved can be a little more complex to visualize, but there are less tools that the hypervisor has to work with.

· Time slicing

· Co-Scheduling

· Reservations

· Shares

· Limits

For a start, time is no longer a constant. The hypervisor has the ability to run time at whatever speed it likes, just so long as it averages out in the end.

Co-Scheduling is where we have to have all the vCPUs for a single VM, mapped to logical CPUs from the hardware.

Reservations and Shares apply here also and we’ll have more of a look at how they work later.

Limits (also exist for memory), but these can be applied to restrict some VMs down to a smaller amount of CPU than their vCPU allocation would otherwise allow them to have.

Let’s start with Time Slicing.

Time Slicing

In a typical VMware host we have more vCPUs assigned to VMs than we do physical cores. The processing time of the physical cores (or logical CPUs if hyper threading is in play), has to be shared among the vCPUs in the VMs. The more vCPUs we have, the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time, extra time interrupts are sent in quick succession when the VM is processing, so time passes slowly and then very fast.

Significant improvements have been made in this area over the releases of VMware. vCPUs can be scheduled onto the hardware a few milliseconds apart but the basic concept remains in place.

Join me again on Friday when I'll look at VMWare vCPU Co-Scheduling & Ready Time.

Phil Bell

Consultant

Monday, 12 December 2016

Virtualization Oversubscription - What’s so scary? (12 of 20) Memory Idle Tax

One of VMware's memory over subscribe technologies is Memory Idle Tax.

• Memory has Shares - Like reservations, a VM also has an associated number of shares. The more shares, the more priority it has over the resource if there is contention.

If a virtual machine is not actively using its currently allocated memory, ESX Server charges a memory tax — more for idle memory than for memory that is in use. That is, the idle memory counts more towards the share allocation than memory in use

• Memory Tax associates a value to each page used

• Default Idle Tax rate is 75%

• This makes idle memory cost 4 times as many shares as active memory

The end result is that VMs holding onto a lot of idle memory, will be more likely to have the balloon driver inflate inside them to try and release some of that idle memory for use by other VMs.

So put simply it's a mechanism to take idle or unused memory from guest VMs that are hogging it in order to give that memory to another VM where it’s more badly needed.

On Wednesday I'll be looking at CPU oversubscription, in the meantime register for our next webinar 'Hardware's a Commodity - Why bother managing capacity?'

http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Phil Bell

Consultant

Friday, 9 December 2016

Virtualization Oversubscription - What’s so scary? Reservations (11 of 20)

Reservations are associated with Resource Pools or individual VMs. Essentially you are setting a value for CPU or Memory that the VM is guaranteed to get. If the VM doesn’t use all its reservation other VMs can make use of the Memory and CPU.

The fairly obvious caveat is that you cannot have a total list of reservations that are bigger than the hardware, this is illustrated below.

VM7 cannot start as the total of VM1, VM2 and VM7 exceeds the 1000 MHz available.

You can use reservations to ensure that important VMs get the resources they want, so you don’t have to worry about avoiding oversubscription for everything. Pick the VMs you want to perform their best and give them a reservation that ensures that, then your background VMs can be pushed out of the way if required.

Phil Bell

Consultant

Wednesday, 7 December 2016

Virtualization Oversubscription - What’s so scary? Memory stats(10 of 20)

Let’s look at some stats for a single Virtual Machine

This is a 4GB VM, but it’s only accessing about 400 MB on a regular basis. It’s got 2.6GB of memory that’s unique to itself, and 1.4GB that’s shared with other VMs.

So at least one other VM is likely to be sharing at about 1.4GB memory as well. Given there are a lot of windows VMs in that cluster it’s likely a lot of them have similar amounts of shared memory. If there are 10 VMs on that host then that’s about 15GB or RAM that you don’t have to have installed or rather, a few more VMs that will fit on the host.

There are also a couple of hours where the balloon driver steals some memory from the VM. Only about 50MB and given the VMs only accessing 4 to 500MB of RAM, out of the 2.6GB that it’s using, the OS probably just released some cache to satisfy that request.

On Friday we'll be looking at Reservations and how they work.

Don't forget that registration is open for our next webinar 'Hardware's a commodity - Why bother managing capacity? http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Phil Bell

Consultant

Monday, 5 December 2016

Virtualization Oversubscription - What’s so scary? VMkernel Swap (9 of 20)

A reservation is typically set against a resource pool and filters down to give a VM rights against memory. Essentially if a reservation has been set and applies to this VM, then the VM is guaranteed that amount of memory will be made available in RAM on the EX host. You can never reserve more memory than exists. So reservations can ensure good performance for the VMs you care about. You put the VMs in a resource pool, and allocate a reservation that’s appropriate. That might be your 1:1 ratio with allocated and reservation. Then let other less important VMs worry about oversubscription.

When an ESX host is very short of memory it may have to resort to using .vswp swap files for the VM memory. At this point performance will be affected as data that the OS believes is in memory is, in reality, now on disk.

A VM as default can have up to 65% of its memory used by the balloon driver. It may also have a memory reservation. The reservation cannot be swapped or taken up by the balloon driver. Any memory outside the 65% used by the balloon driver, and the reservation, can be placed into a .vswp file.

In reality you never want this to happen.

On Wednesday I'll be discussing Memory.

Phil Bell

Consultant

Friday, 2 December 2016

Virtualization Oversubscription - What’s so scary? Memory test (8 of 20)

If memory must be copied to or from disk because there is more requested than can be satisfied, what’s the penalty for doing this?

• Memory vs. disk speed is…?

A) Memory is 100x faster than disk

B) Memory is 1,000x faster than disk

C) Memory is 10,000x faster than disk

D) Memory is 100,000x faster than disk

E) Memory is 1,000,000x faster than disk

F) I have no memory of the event, your honour

A modern disk will respond to an I/O in about 5 milliseconds (5 * 1/1000 of a second). Access to memory is usually in the order of 50 nanoseconds (50 *1/1,000,000,000 of a second).

That makes disk access a hundred thousand (100,000) times SLOWER than memory access. Tiny numbers like this are difficult to comprehend, so imagine that the memory access time was 1 second. To write something to disk would then take about 27 ¾ hours to complete.

That’s one good reason for avoiding swapping if at all possible! On Monday I'll be looking at vKernel Swap.

In the meantime why not take a look at some of our VMware on-demand webinars and white papers. Join our Community via our Resources section and get free access.

http://www.metron-athene.com/resources/login.asp

Phil Bell

Consultant

Wednesday, 30 November 2016

Virtualization Oversubscription - What’s so scary? Balloon Driver (7 of 20)

Today we'll take a look at the Balloon Driver (vmmemctl)

The Problem

A process in VM1 is shut down and its memory is freed in the OS.

The “hardware” does not know. The data is still there but only the OS inside the VM knows it can overwrite it.

The VMware Solution

When memory gets tight on an ESX host, the VM kernel will pick a VM (based on shares), and tell the balloon driver to request some memory.

The balloon driver requests memory and “pins” it so it cannot be paged. The Memory on the ESX is then freed up and can be allocated to another system.

On Friday I'll be talking about Memory Test.

There's less than a week to go until our 'VMware Capacity and Performance Essentials' online workshop starts, still time to book your place.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell
Consultant

Monday, 28 November 2016

Virtualization Oversubscription - What’s so scary? Memory Oversubscription (6 of 20)

As stated in my previous blog having set out a few ground rules we can now look at memory oversubscription.

So how can we over-subscribe memory?

Free Space - Just like disks have a lot of free space, servers typically run with free space in memory.

Then there are a number of tricks that the hypervisor can do to find even more savings.

· Page Sharing (Deduplication)

· Balloon Driver (VMware)

· Reservations

· Shares

We'll start with a look at:

Transparent Page Sharing

When two or more Virtual Machines have the same pages of data in memory, VMware can store a single copy and present it to all the VMs. Should a VM alter a shared memory page, a copy will be created by VMware and presented to that VM.

An Example is shown in the graphic below:

VM1 starts and allocates some unique memory.

VM2 starts and allocates some unique memory.

VM1 allocates memory for a standard windows dll.

VM2 also allocates memory for the same standard windows dll.

VMware maps both systems memory to the same page in RAM.

On Wednesday I'll explain more about the VMware Balloon Driver.

In the meantime register for access to some of our great Resources, white papers, on-demand webinars and more.

http://www.metron-athene.com/_resources/login.asp

Phil Bell

Consultant

Friday, 25 November 2016

Virtualization Oversubscription - What’s so scary? VMware CPU and Memory Maximums (5 of 20)

CPU and Memory are the main items people consider for Virtualized systems, so let’s lay down the maximums.

• Virtual Machine Maximum

– 128 vCPUs per VM

• Host CPU maximums

– Logical CPUs per host 480 (Logical CPUs being simultaneous threads so that might be 240 hyper-threaded cores.)

– Virtual machines per host 1024

– Virtual CPUs per host 4096

– Virtual CPUs per core 32

• Maximum with a caveat: The achievable number of vCPUs per core depends on the workload and specifics of the hardware. For more information, see the link below for the latest version of Performance Best Practices for VMware vSphere

https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-configuration-maximums.pdf

This raises 2 points.

· Clearly it’s ok to oversubscribe CPUs

· There is no set number to tell you how much oversubscription is OK.

Memory VMware Maximums

Memory is a lot simpler.

• 6TB per Host

– Well 12TB on specific hardware

• 4TB per VM

Having set out those few ground rules we can now look at memory oversubscription and I'll be doing just that on Monday.

There are still a few places left on our VMware vSphere Capacity & Performance Essentials online workshop so don't forget to book your place.

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell

Consultant

Wednesday, 23 November 2016

Virtualization Oversubscription (What’s so scary?) - What can be oversubscribed? (4 of 20)

Today I'll deal with what can be oversubscribed.

What can be oversubscribed?

• CPUs

• Memory

• Disk

• NICs

In our virtual world, now that we have broken the link between the OS and the hardware, we can over provision all sorts of things.

CPU, Memory, Disk (as we mentioned) and NICs are all “Oversubscribed”.

Disk we have already looked at, Memory and CPU I’ll go into in more detail on later but I thought it was worth mentioning NICs here.

Typically people seem to be running with 10 - 15 VMs on a single host which will have significantly fewer NICs installed. A Server typically wouldn’t use all the bandwidth of its NIC so that unused bandwidth is like the unused space on disk.

When the VMs talk to other VMs on the same Host that’s not generating traffic though the physical NICs, so we might consider that as the equivalent of de-duplication.

In the next few blogs I'll be looking at CPU and Memory.

Phil Bell

Consultant

Monday, 21 November 2016

Virtualization Oversubscription - What’s so scary? Oversubscription (3 of 20)

So what is oversubscription? In its simplest terms it’s allocating more than you have.

Thin Provisioning

The most obvious example of Thin provisioning happens in storage. It’s been around for a long time and is the same thing simply known by another name.

With storage you have the LUNs that are allocated. Now traditionally these would have been a physical allocation on disk that was available for use but with Thin Provisioning you can allocate more space to LUNs than you actually have. The reason being, that most disks on servers are not full. So if the average disk is 30% full, you could get away with only having 50% of your allocated storage as real usable space that exists, and you’d still have plenty of space to grow into.

On top of that, some storage systems will do their own deduplication.

Deduplication & Compression

Imagine you have 200 Windows 2012 servers, all with a C drive that just has the OS on it. That’s about 12 GB per server storing the same base OS files or 2.4TB of space. Now those OS disks need space for things like memory dumps, updates and log files etc, which is the unused space but do you want to spend 2.4TB of storage storing the same 12GB of files 200 times? Probably not.

You’d prefer to store a single copy of all the identical files and let them all access that single copy. So you’re not just ignoring some of the unused space, you’re able to store less as well.

So, 200, 32 GB drives (Minimum Windows 2012 requirement), would be 6.4TB. It’s now theoretically come down to something like 20GB used space with thin provisioning and deduplication.

On Wednesday I'll deal with what can be over-subscribed.

In the meantime our VMware online workshop is coming round fast so make sure you book your place

http://www.metron-athene.com/services/online-workshops/capacity-management-workshops.html

Phil Bell
Consultant