Thursday, 24 December 2015

Some end of 2015 thoughts from our CEO....

Gartner identified their Top 10 Strategic Technology Trends for 2015 at their October Symposium in Orlando. See

Knowing how capacity management people just love statistics, I thought I’d offer a percentage relevance/impact on capacity management of the 10 items from Metron’s perspective. What can’t be denied is that dependent on your industry, any or all of these will impact you at some stage if you are a capacity manager. 

Top 10 Item Capacity Management Impact     2015        Trend

Computing Everywhere                                        100%           à

Whether it is providing support for mobile devices with our athene® software or helping people handle the mass of data being generated as their own industry makes increasing use of them, this is definitely a major direction for Metron. 

Internet of Things                                                    20%           ↑ 

Some impact so far as our clients have had to deal with increasing volumes of data, predominantly due to Cloud based implementations. This will only increase as the Internet of Things becomes more pervasive, for example the predicted rapid growth in pay-per-use applications. 

3D Printing                                                                   5%           ↑ 

With shipments of 3D printers expected to double in 2016, this is clearly a major growth area. For Metron, we are starting to see the impact on applications with embedded links to 3D printing increasing the processing load on environments further. 

Advanced, pervasive and invisible analytics        50%         ↑

More and more systems, more and more data, means analytics needs to get smarter. Research and development for Metron continues to focus on new ways to intuitively analyze and use the ever- increasing pool of data available to capacity managers. 

Context Rich Systems                                                20%         ↑

New research and development in the works includes development of context-sensitive planning techniques to enable capacity planning to evolve with the emerging world rapid deployment, rapid change systems. 

Smart machines                                                         10%         ↑

The smarter the machine the greater the impact if it fails. Even with ever cheaper commoditized components, manufacturing will continue to want to optimize costs. Ensuring just the right capacity is available to each component of your smart car or helpful household robot will be essential to enable producers to maximize profits and keep feeding that next generation of R&D.

Cloud/Client Computing                                        100%         à

Already a significant element of life for an capacity Manager or Capacity Management software provider. Stories already circulate of overspend and uncontrolled spend on Cloud systems. Over time, businesses will increasingly need capacity management to prevent and remove the waste. For Metron, ever more development effort goes on enhancing the tools available to do this.

Software Defined Application and Infrastructure 50%      ↑

Metron moved to the agile principles underpinning SDA and SDI some years ago for our own in- house development, seeing how our software would need to evolve ever quicker in response to changing demands. Work underway now looks to provide the same agile approach to rules and models in our future software to deliver rapid response, self-learning capacity planning and management techniques.

Web-Scale It                                                                   20%     ↑

Many of our clients are now starting to deploy Cloud style systems pioneered by the likes of Google and Facebook, within their own data centers. For some time past and looking well ahead, Metron has been concentrating software enhancements on ensuring that our software evolves to support evolving data center architectures, while still supporting the traditional implementations that will provide core organization systems for some time.

Risk-Based Security and Self-Protection                    5%      ↑

With all of Metron’s athene® facilities being increasingly provided as web-based applications, work continues to provide ever-increasing application security, to supplement the sophisticated perimeter security now deployed by all our clients.

We hope all of our clients and friends have a wonderful holiday and we'll see you in 2016.

Andrew Smith
CEO, Metron

Tuesday, 8 December 2015

Vmware Capacity Planning

VMware is the go-to option for virtualization for many organizations, and has been for some time. The longer it’s been around, the more focus there is on making efficiency savings for the organization. This is where the Capacity Manager really needs to understand the technology, how to monitor it, and how to decide what headroom exists.

I'll be running a webinar 'VMware Capacity Planning' on December 16, where I'll  take a look at some of the key topics in understanding VMware Capacity. 

      •     Why OS Monitoring Can be Misleading

     5 Key VMWare Metrics for Understanding VMWa re capacity

     How VMWare processor scheduling impacts CPU capacity measurements

      Measuring Memory Capacity

      Measuring Disk Storage  Latency

     Calculating Headroom in VMs

Register for your place now

Phil Bell

Friday, 4 December 2015

Top 10 VMware Metrics to help pinpoint bottlenecks

Top 10 VMware metrics list to assist you in pinpointing performance bottlenecks within your VMware vSphere virtual infrastructure.  I hope you find these useful. 

1. Ave CPU Usage in MHz - this metric should be reported for both host and guest levels.  Because a guest (VM) has to run on an ESX host, that ESX host has a finite limit of resource.  High CPU Usage at the host level could indicate a bottleneck, however create a breakdown of all guests hosted to give a clear indication of who is using the most.  If you have enabled DRS on your cluster, you may see a rise in the number of vMotions as DRS attempts to load balance.
2. CPU Ready Time - This is an important metric that gives a clear indication of CPU Overcommitment within your VMware Virtual Infrastructure.  CPU Overcommitment can lead to significant CPU performance problems due to the way in which ESX CPU schedules Virtual CPU (vCPU) work onto Physical CPUs (pCPUs).  This is reported at the guest level.  Any values reported in seconds can indicate that you have provisioned too many vCPUs for this guest.  Look at all the vCPUs assigned to all hosted guests and then the number of Physical CPUs available on the host(s) to see whether you have overcommitted the CPU.
3. Ave Memory Usage in KB - similar to Average CPU Usage, this should be reported at both Host and Guest levels.  It can give you an indication in terms of who is using the most memory but high usage does not necessarily indicate a bottleneck.  If Memory Usage is high, look at the values reported for Memory Ballooning/Swapping.
4. Balloon KB - values reported for the balloon indicate that the Host cannot meet its Memory requirements and is an early warning sign of memory pressure on the Host.  The Balloon driver is installed via VMware Tools onto Windows and Linux guests and its job is to force the operating system, of lightly used guests, to page out unused memory back to ESX so it can satisfy the demand from hungrier guests.
5. Swap Used KB - if you see values being reported at the Host for Swap, this indicates that memory demands cannot be satisfied and processes are swapped out to the vSwp file.  This is ‘Bad’.  Guests may or will have to be migrated to other hosts or more memory will need to be added to this host to satisfy the memory demands of the guests.
6. Consumed - Consumed memory is the amount of Memory Granted on a Host to its guests minus the amount of Memory Shared across them.  Memory can be over-allocated, unlike CPU, by sharing common memory pages such as Operating System pages.  This metric displays how much Host Physical Memory is actually being used (or consumed) and includes usage values for the Service Console and VMkernel.
7. Active - this metric reports the amount of physical memory recently used by the guests on the Host and is displayed as “Guest Memory Usage” in vCenter at Guest level.
Disk I/O
8. Queue Latency - this metric measures the average amount of time taken per SCSI command in the VMkernel queue. This value must always be zero. If not, it indicates that the workload is too high and the storage array cannot process the data fast enough.
9. Kernel Latency - this metric measures the average amount of time, in milliseconds, that the VMkernel spends processing each SCSI command. For best performance, the value should be between 0-1 milliseconds. If the value is greater than 4ms, the virtual machines on the Host are trying to send more throughput to the storage system than the configuration supports. If this is the case, check the CPU usage, and increase the queue depth or storage.
10. Device Latency - this metric measures the average amount of time, in milliseconds, to complete a SCSI command from the physical device. Depending on your hardware, a number greater than 15ms indicates there are probably problems with the storage array.   Again if this is the case, move the active VMDK to a volume with more spindles or add more disks to the LUN.
Note:  Please be aware when reporting usage values, you take into consideration any child resource pools specified with CPU/Memory limits and report accordingly. 
We're running a webinar  'VMware Capacity Planning', register and come along
Jamie Baker
Principal Consultant

Wednesday, 2 December 2015

Idle VMs - Why should we care? (3 of 3)

Earlier in the week I looked at the impact idle VM’s can have on CPU utilization and memory overhead today I’m going to look at the amount of Disk or Datastore space usage per Idle VM. 

Each one will have associated VMDK (disk) files.  The files are stored within a Datastore, which in most cases is hosted SAN or NAS storage and shared between the cluster host members.  If VMDKs are provisioned as "Thick Disks" then the provisioned space is locked out within the Datastore for those disks.

To illustrate this an example of a least worst case scenario would be:  100 Windows  idle VMs have been identified across the Virtual Infrastructure and each VM has a "Thick" single VMDK of 20GB used to house the operating system.  This would then equate to 2TB of Datastore space being locked for use by VMs that are idle.  You can expand this further by, making an assumption  that some if not all VMs are likely to have more disks and of differing sizes.

The simple math will show you how much Datastore space is being wasted.
There is a counter to this, known as Thin Provisioning.  By using Thin disks, in which the provisioned disk size is reserved but not locked you  would not waste the same amount of space as you would by using Thick Disks.  Using Thin Provisioning also has the added benefit of being able to over allocate disk space thus leading to a reduction in the amount of up front storage capacity required, but only incurring minimal overhead.

Idle VMs -  why you should care.

Identifying Idle VMs, questioning whether they are required, finding out who owns them and  removing them completely will reduce or help eliminate VM sprawl and help to improve the performance and capacity of the Virtual Infrastructure by:

·       reducing unnecessary timer interrupts

·       reducing allocated vCPUs

·       reducing unnecessary CPU and Memory overhead

·       reduce used Datastore space

·       lead to more efficient use of your Virtual Infrastructure, including improved VM to Host ratios and reduction in additional hardware.
Don't forget to sign up for our Capacity Management Maturity online workshop.

Jamie Baker

Principal Consultant