Metron - Capacity Management: virtualization

Showing posts with label virtualization. Show all posts

Wednesday, 4 January 2017

Virtualization Oversubscription - What’s so scary? 18 of 20

Happy New Year!

If you’ve been following my series then you’ll know that just before the Holidays I said that I’d deal with what’s the worst that can happen when you oversubscribe.

So what’s the worst that can happen?

Well if you push things too far, all those things that the Hypervisor can do to try and keep things running will eventually be overwhelmed.

If you try to use too much memory you’ll start to see ballooning on a consistent basis, then swapping. At that point performance will degrade rapidly. Watch active memory values and take ballooning increasing as the indication things are getting tight.

CPU is as always a more gentle decay in performance. CPU also has it’s indicators that the limits are being approached. CPU Ready and Co-Stop are indicators that VMs are finding it tricky to find CPUs when they want to do some processing.

The reason CPU degrades differently to Memory is that it’s used differently. A process is in memory all the time, but only uses a CPU when it needs so CPU busy is dictated by how frequently the CPU is required and for how long. The performance of a transaction will be dictated by the ‘chance’ that a CPU will not be available when the transaction arrives. If all the CPUs are busy it’ll enter a queue and this is where queueing theory comes in.

Contention and Queuing

Any system has a finite set of resources. If you only have a single user trying to use one workstation then there is no contention for the use of that workstation. As soon as you have more than one user then there is a chance that they will want to use the workstation at the same time. That’s contention. It’s perfectly normal and happens inside every OS all the time. There are lots more process threads than there are CPUs, and when there is contention, then the processes queue. Poor performance only occurs when queueing becomes excessive.

On Friday I'll go in to more detail about the basic ideas of queuing. In the meantime register for our first webinar of 2017 'Performance Management made easy'

http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Phil Bell

Consultant

Wednesday, 9 November 2016

Idle VM's - Why should we care? (1 of 3)

The re-emergence of Virtualization technologies, such as VMware, Microsoft's Hyper-V, Xen and Linux KVM has provided organizations with the tools to create new operating system platforms ready to support the services required by the business, in minutes rather than days.

Indeed IT itself is a service to the business.

In more recent times, Cloud computing which in itself is underpinned by Virtualization, makes use of the functionality provided to satisfy :

on-demand resources
the ability to provision faster
rapid elasticity (refer to NIST 's description of Cloud Computing)

Cloud computing makes full use of the underlying clustered hardware. Constant strides are being made by Virtualization vendors to improve the Virtual Machine (VM) to Host ratio, without affecting the underlying performance.

But, you may ask "What's this got to do with Idle VMs?"

Well, as I described earlier Virtualization provides the means to easily and quickly provision virtual systems. Your CTO/CIO is going to demand a significant ROI once an investment in both the hardware and virtualization software has been made, possibly switching the focus to an increase in the VM to Host ratio.

“What's wrong with that?” I hear you say. Nothing at all, as long as you keep track of what VMs you are provisioning and :

what resources you have granted
what they are for

Failure to do so will mean that your quest for a good ROI and a satisfied Chief will be in jeopardy, as you’ll encounter a term most commonly known as VM Sprawl.

More about this on Friday.

In the meantime why not register for our webinar VMware Cluster Planning'

http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Jamie Baker

Principal Consultant

Friday, 14 October 2016

5 Top Performance and Capacity Concerns for VMware - Ready Time

As I mentioned on Wednesday there are 3 states which the VM can be in:

Threads – being processed and allocated to a thread.

Ready – in a ready state where they wish to process but aren’t able to.

Idle – where they exist but don’t need to be doing anything at this time.

In the diagram below you can see that work has moved over the threads to be processed and there is some available headroom. Work that is waiting to be processed requires 2 CPU’s so is unable to fit and creates wasted space that we are unable to use at this time.

We need to remove a VM before we can put a 2 CPU VM on to a thread and remain 100% busy.

In the meantime other VM’s are coming along and we now have a 4vCPU VM accumulating Ready Time.

2 VM’s moves off but the 4vCPU VM waiting cannot move on as there are not enough vCPU’s available.

It has to wait and other work moves ahead of it to process.

Even when 3vCPU’s are available it is still unable to process and will be ‘queue jumped’ by other VM’s who require less vCPU’s.

Hopefully that is a clear illustration of why it makes sense to reduce contention by having as few vCPUs as possible in each VM.

Ready Time impacts on performance and needs to be monitored. On Monday I'll be dealing with Monitoring Memory.

Phil Bell

Consultant

Friday, 7 October 2016

5 Top Performance and Capacity Concerns for VMware

I'll be hosting our webinar VMware and Hyper-V Virtualization Over-Subscription (What's so scary?) on October 12 http://www.metron-athene.com/services/webinars/index.html so I thought it would be pertinent to take a look at the Top 5 Performance and Capacity Concerns for VMware in my blog series.

I’ll begin with Dangers with OS Metrics.

Almost every time we discuss data capture for VMware, we’ll be asked by someone if we can capture the utilization of specific VMs, by monitoring the OS. The simple answer is no.

In the example below the operating system sees that VM1 is busy 50% of the time but VMware sees is that it was only there for half of half the time and accordingly reports that it is 25% busy.

Looking at the second VM running, VM2, both the operating systems and VMware are in accordance that it is in full use and report that it is 50% busy.

This is a good example of the disparity that can sometimes occur.

OS vs VMware data

Here is data from a real VM.

The (top) dark blue line is the data captured from the OS, and the (Bottom) light blue line is the data from VMware. While there clearly is some correlation between the two, at the start of the chart there is about 1.5% CPU difference. Given we’re only running at about 4.5% CPU that is an overestimation by the OS of about 35%. While at about 09:00 the difference is ~0.5% so the difference doesn’t remain stable either. This is a small system but if you scaled this up it would not be unusual to see the OS reporting 70% CPU utilisation and VMware reporting 30%.

This large difference between what the OS thinks is happening and what is really happening all comes down to time slicing.

I'll be looking at time slicing on Monday.

Phil Bell

Consultant

Monday, 8 August 2016

Windows Server Capacity Management 101 - Has the move to Virtualization corrected the issue of under utilized resources?

As I mentioned last week people have got used to the idea of having their own windows systems for their own section of work and have been slow in adopting a more resource sharing attitude - this means that each server is doing very little.

Has the move to virtualization technology corrected this issue?

Virtualization allows multiple Windows systems to run on one physical machine as ‘guests’ under a hypervisor and this in theory should make the physical system utilize more resources but the problem lies in the mentality of the people running the system not the technology. Packing guests together means physical machine utilization should be higher, but a number of problems still persist:

“MINE!” still prevalent - Being used to having a one server per service environment staff create the same situation with Virtual machines, but now it has been made even easier to have multiple machines and often they’re not even very busy!
Easy to suffer virtualization sprawl – this is where the number of virtual machines (VMs) on a network reaches a point where the administrator can no longer manage them effectively
Virtualization sprawl is often for “good reasons” – such as building redundancy into the system, this means if one VM is taken down another one can be put up with virtually no downtime.
Some organizations rely on high-availability / dynamic resource sharing – Careful Planning is needed to make sure that components of a service do not end up together on the same physical machine as if it fails it takes down a whole service.

So how do we properly capacity manage a Windows environment? I'll be talking about this on Wednesday.

In the meantime sign up for our next webinar 'Capacity Management from the ground-up - a case study'

http://www.metron-athene.com/services/webinars/index.html

Josh Worth

Consultant

Monday, 20 June 2016

Display of different presentation types (7 of 17) Capacity Management, Telling the Story

As discussed today I'll be looking at the types of presentations that you can use.

Below is a selection of them:

Humans like visual representation so using these charts in the right way and gauging which are right to represent the information to your audience is crucial.

Dashboard - More aligned to presenting real time information. The key thing to remember is that any dashboard you use should auto update.

Analysis - Presents the drill down of a problem. This is the root cause analysis, where we know there is a problem and we want to drill down and show what is causing the issue. Where’s the bottleneck? Was there a change?

Advice - Provide some automatic advice, automatic interpretation of the data that you are reporting on.

Virtualization - Report on virtualization data, make it easy to understand what is happening in your virtual environment.

Business - We have discussed about bringing in business data to the CMIS. Why do we want to do that? We can look for correlations by measuring component data against business metrics and show these in our business reports.

Trending - We can show ‘what-if’ trending which can give you a ‘time to live’ value.

Modeling - More accurate prediction reports to show are modeling reports. You can show things like future system response times or identify where any future bottlenecks are likely to occur.

Breakdown - Shows further analysis on the data in an easy to understand way.

Don't forget the key is to always remember to tailor your presentations to suit your audience.

On Wednesday I'll be dealing with what we will see at the management level.

There's a great webinar coming soon 'Capacity Planning & Forecasting using analytical modeling', don't miss it!
http://www.metron-athene.com/services/webinars/index.html

Charles Johnson

Principal Consultant

Monday, 23 May 2016

VMware Capacity Management

VMware is the go-to option for virtualization for many organizations, and has been for some time.
The longer it's been around, the more focus there is on making efficiency savings for the organization. This is where the Capacity Manager really needs to understand the technology, how to monitor it, and how to decide what headroom exists.

I'm running a VMware Capacity Management webinar this Wednesday May 25 (8am PDT, 9am MDT, 10am CDT, 11am EDT, 4pm UK, 5pm CEST) where I'll be taking a look at some of the key topics in understanding VMware Capacity.

Topics will include:

Why OS monitoring can be misleading
5 Key Metrics
Measuring Processor Capacity
Measuring Memory Capacity
Calculating Headroom in VMs

Look forward to seeing you there.

Dale Feiste

Principal Consultant

Friday, 20 May 2016

Top 5 Dont's for VMware

As promised today I’ll be dealing with the TOP 5 Don’ts for VMware.

DON’T

1) Overcommit CPU (unless ESX Host usage is less than 50%)

I’m sure that most of you have heard of CPU Ready Time. CPU Ready Time is the time spent (msecs) that a guest vCPUs are waiting to run on the ESX Hosts physical CPUs. This wait time can occur due to the co-scheduling constraints of operating systems and a higher CPU scheduling demand due to an overcommitted number of guest vCPUs against pCPUs. The likelihood is that if all the ESX hosts within your environment have on average a lower CPU usage demand, then overcommitting vCPUs to pCPUs is unlikely to see any significant rise in CPU Ready Time or impact on guest performance.

2) Overcommit virtual memory to the point of heavy memory reclamation on the ESX host.

Memory over-commitment is supported within your vSphere environment by a combination of Transparent Page Sharing, memory reclamation (Ballooning & Memory Compression) and vSwp files (Swapping). When memory reclamation takes place it incurs some memory management overhead and if DRS is enabled automatically, an increase in the number of vMotion migrations. Performance at this point can degrade due to the increase in overhead required to manage these operations.

3) Set CPU or Memory limits (unless absolutely necessary).

Do you really need to apply a restriction on usage to a guest or set of guests in a Resource Pool? By limiting usage, you may unwittingly restrict the performance of a guest. In addition, maintaining these limits incurs overhead, especially for memory, where the limits are enforced by Memory Reclamation. A better approach is to perform some proactive monitoring to identify usage patterns and peaks, then adjust the amount of CPU (MHz) and Memory (MB) allocated to your guest virtual machine. Where necessary guarantee resources by applying reservations.

4) Use vSMP virtual machines when running single-threaded workloads.

vSMP virtual machines have more than one vCPU assigned. A single-threaded workload running on your guest will not take advantage of those “extra” executable threads. Therefore extra CPU cycles used to schedule those vCPUs will be wasted.

5) Use 64-bit operating systems unless you are running 64-bit applications.

Whilst 64-bit operating systems are near enough the norm these days, do check that you need to use 64 bit as these require more memory overhead than 32-bit. Compare the benchmarks of 32/64-bit applications to determine whether it is necessary to use the 64-bit version.

We're running a webinar on VMware Capacity Management on May 25th visit our website and sign up to come along.

http://www.metron-athene.com/services/webinars/index.html

Jamie Baker

Principal Consultant

Wednesday, 6 April 2016

Top 5 Key Capacity Management Concerns Unix/Linux - VM Principles (4 of 12)

Continuing on today with a look at VMware principles. The following principles of virtual machines hold true across all virtualized environments provided within the IT industry.

• Isolation

– Own root (/), processes, files, security

• Virtualization

– Instance of Solaris O/S

• Granularity

– Resource allocation, Pools

• Transparency

– Standard Solaris interfaces

• Security

– No global reboots, isolated within each zone

On Friday I'll look at performance concerns.

Don't forget to register for our 'Unix and Linux Capacity Management' webinar http://www.metron-athene.com/services/webinars/index.html

Jamie Baker

Principal Consultant

Tuesday, 29 March 2016

Top 5 Key Capacity Management Concerns for Unix/Linux (1 of 12)

Since UNIX systems were developed back in the 1970’s, things as you’d expect have moved on a long way. Single CPU systems, the size of washing machines, have been replaced with many racks of multi-core (hyper-threaded) blade servers.

In more recent years, the re-introduction of virtualization allowed for multiple virtual systems to be hosted on a single piece of hardware via a hypervisor. Modern data centers will typically host many thousands of both physical and virtual servers.

Physical and Virtual hosts

– Web Servers

– Database hosts

The mix of physical and virtual tends to depend on what applications are being hosted. It is likely that physical servers will be hosting large RDBMS instances and virtual servers hosting web applications.

On Thursday I'll be taking a look at licensing concerns.
In the meantime why not sign up to come along to our webinar 'Unix and Linux Capacity Management'
http://www.metron-athene.com/services/webinars/index.html

Jamie Baker
Principal Consultant