Metron - Capacity Management: Dale Feiste

Showing posts with label Dale Feiste. Show all posts

Monday, 3 July 2017

Understanding VMware Capacity - Why OS monitoring can be misleading (1 of 10)

Dangers with OS Metrics

Almost every time we discuss data capture for VMware, we’ll be asked by someone if we can capture the utilization of specific VMs, by monitoring the OS. The simple answer is no.

The more complex answer is that we can capture the data from the OS, but it may not be reliable. So here’s an example of why.

We have 2 VMs. Within the 1 second interval we are looking at, one of the VMs was only allocated the CPU for ½ a second. In that ½ second the VM used 50% of it’s possible CPU time. So from the OS perspective it was running at 50% CPU utilization. If we look at data from VMware, we’ll see that VMware knows the VM only used ½ the CPU available in ½ a second. Or 25%.

The 2nd VM was running on CPU for the entire second. And again it used 50% of it’s possible CPU. So, to the OS, it appears it was running at 50% CPU utilization, and VMware has the same result.

The more contention there is for CPU time, the more time VMs will spend Dormant/Idle, and the further apart the values will be. This effect means that any metrics which have an element of time in their calculation cannot be relied upon to be accurate.

Here is data from a real VM

The (top) dark blue line is the data captured from the OS, and the (Bottom) light blue line is the data from VMware. There clearly is some correlation between the two. At the start of the chart there is about a 1.5% CPU difference. Given we’re only running at about 4.5% CPU that is an overestimation by the OS of about 35%. But at about 09:00 the difference is ~0.5% so the difference doesn’t remain stable either.

Historically it’s not been unusual to see situations where the OS metric is reporting 70% CPU utilization and VMware is reporting 30%.

More on Wednesday, in the meantime don't forget to register for our next webinar 'Top 5 VMware tips for performance and capacity'

https://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Phil Bell

Consultant

Thursday, 12 January 2017

Performance Management made easy

Monitoring physical IT infrastructure components such as disks, CPU, network, and databases provides important information about system utilization.
Creating and monitoring synthetic transactions for partial views of the user experience and validating application availability provide useful information about what users may be experiencing. However, neither of these provide the actual experience users have when using real applications.

Application Performance and Real User Experience is the new generation of monitoring that provides information about exactly what is happening with user response.

I'll be running a webinar on January 18 which examines the problem solved by application performance management (APM) tools, shows how efforts to monitor and troubleshoot complex applications without good visibility can be very tedious and time consuming, and describes different methods that APM tools use to obtain data, comparing and contrasting the different approaches.

Modern application transactions can start on a smart phone or virtual desktop web browser and span many diverse infrastructure resources before returning results to the users. Knowing how much time is taken across all of these elements is crucial for quickly identifying performance problems.

In my webinar I'll examine:

Modern application architecture
The problem solved by APM
Solving performance problems without APM
Users, applications and transactions
Different approaches to APM

I'll be sharing some examples with you too.

Registration for this event is now open, so don't forget to book your place.

http://www.metron-athene.com/services/webinars/capacity-management-webinars.html

Dale Feiste

Principal Consultant

Monday, 23 May 2016

VMware Capacity Management

VMware is the go-to option for virtualization for many organizations, and has been for some time.
The longer it's been around, the more focus there is on making efficiency savings for the organization. This is where the Capacity Manager really needs to understand the technology, how to monitor it, and how to decide what headroom exists.

I'm running a VMware Capacity Management webinar this Wednesday May 25 (8am PDT, 9am MDT, 10am CDT, 11am EDT, 4pm UK, 5pm CEST) where I'll be taking a look at some of the key topics in understanding VMware Capacity.

Topics will include:

Why OS monitoring can be misleading
5 Key Metrics
Measuring Processor Capacity
Measuring Memory Capacity
Calculating Headroom in VMs

Look forward to seeing you there.

Dale Feiste

Principal Consultant

Monday, 7 March 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Workload Profiles (10 of 10)

As mentioned previously the Read/Write metrics can help you to get a handle on your workload profiles.

Application type is important in estimating performance risk, for instance, something like Exchange is a heavy I/O user.

I’ve also seen examples where virtual workstations were being installed and resulted in a large I/O hit that could have impacted other applications sharing storage.

Scorecards

This is an example of a score card, where you can have a large amount of information condensed in to one easy to view dashboard.

Dashboards

I've included an example below of how you can set up a dashboard and bring key trending and standard reports to you all in one place.

Trending, forecasting, and exceptions with athene®

Storage Key Metrics – Summary

To summarize

• Knowledge of your storage architecture is critical, you may need to talk to a separate storage team to get this information

• Define storage occupancy versus performance

• Discuss space utilization and define

• Review virtualization and clustering complexities

• Explore key metrics and their limitations

Identify key report types and areas that are most important and start with the most critical.

Dale Feiste

Principal Consultant

Friday, 4 March 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Backend Metrics (9 of 10)

Below are some metrics available on the back end storage array:

These are typical performance metrics showing throughput and response times, the type of thing you need to report on regularly so that you can be on top of performance before incidents start being generated.

Performance Capacity – Array Metrics

The key metrics that you need to get a handle on at volume level are throughput, response and latency.

Below is an example of NetApp metrics at volume level.

and below an example of metrics within EMC at the volume level.

The read/write ratio can give you an idea of what your work profile looks like.

Performance Capacity – Component Breakdown

The example below, using athene^®, shows a component breakdown for the server.

It’s essential to know whether you have any queuing going on (shown in yellow above), if queuing is happening you are exceeding the devices throughput rate.

In the final part of my blog series on Monday I’ll take a look at workload profiles, scorecards and dashboards.

Dale Feiste

Principal Consultant

Wednesday, 2 March 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - – Array Architecture (8 of 10)

This is an example of an enterprise type array comprising of:

• Front End Processors

• Shared Cache

• Back End Processors

• Disk Storage

A lot of time these disks can be striped across the entire array, a very large number of spindles tied together to provide a very large resource.

Quite often on these large arrays bottlenecks will occur on the front end processor, requests coming in will queue up there.

Performance Capacity – Array Metrics

As mentioned front end processors are typically the first to bottleneck, below is an example showing just one day.

This is ideal information for trending, if you picked up these processors over a period of time you could do a trend going forward and figure out when and where bottlenecks are likely to occur.

On Friday I’ll be looking at back end metrics. In the meantime join our Community for access to some of my Storage Capacity Management webinars.

http://www.metron-athene.com/_resources/on-demand-webinars/login.asp

Dale Feiste

Principal Consultant

Monday, 29 February 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Response Impacts (7 of 10)

SAN or storage array performance problems can be identified at the host or backend storage environment.

The diagram below shows a typical performance impact in the more complex environment.

With SAN attached storage you can share storage across multiple servers, one of the downsides of this is that you can have storage response impact across multiple servers too.

Performance Capacity – Host Metrics

It's important that you understand the limitations of certain host metrics.

A selection of host metrics are shown below:

• Measured response is the best metric for identifying trouble.

• Host utilization only shows busy time, it doesn’t give capacity for SAN.

• Physical IOPs is an important measure of throughput, all disks have their limitation.

• Queue Length is a good indicator that a limitation has been reached somewhere.

Performance Capacity – Host Metrics

Metrics like host utilization can indicate impactful events, but ample capacity might still be available.

The high utilization can be seen generating large amounts of I/O in the chart below.

Queue lengths indicate that it may not currently be impacting response, but headroom is unknown. Response time is the key, as users will be impacted if it goes up.

On Wednesday I’ll be taking a look at array architecture.

Dale Feiste

Principal Consultant

Friday, 26 February 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Virtual Environments and Clusters (6 of 10)

Managing storage in clustered and/or virtual environments can be challenging because it is shared among all hosts and virtual machines running on it.

Below is an example of a simple 3 node VMware cluster going to some shared storage.

Features that are available

• Thin provisioning

• Storage can be viewed at many levels.

• Could be different tiers allocated to the same cluster

• Overhead at various points

Storage Virtualization

There are advantages to the layered system

• it allows a caching layer so that you may not have to go all the way to the backend to satisfy an I/O request

• there are a lot of administrator features regarding allocation and replication

Pooling physical storage from multiple sources into logical groupings is useful

• Can be a centralized source for collecting data

• If using as a data source beware of double counting with backend

There are a wide variety of techniques for virtualizing storage, be aware of the implications for data collection and reporting.

On Monday I’ll be discussing response impacts on performance capacity and metrics for these.

Dale Feiste

Principal Consultant

Wednesday, 24 February 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Host Metrics (5 of 10)

Moving on to the metrics, for occupancy the key metric is utilization. How much storage are we using and how much is available?

Below are some host metrics that are typically available, these metrics are available at the file system, volume, or logical disk levels.

Array Metrics

The illustration below shows an example of occupancy metrics from the array perspective. This is an example of Netapp filer aggregate metrics (down at the aggregate level).

A lot of these Storage arrays, from the different vendors, have different ways to carve up the storage. Storage groups can be configured as in this example, using NetApp aggregates, which can have many occupancy metrics at different levels.

Some of the NetApp occupancy levels here are not available on the host in general.

I’ll pick out a few of the metrics:

De-dupe – If this is turned on you can find out how much space you’re saving

Total Committed space – A lot of vendors now offer thin provisioning where storage can be over-committed so it looks as though there is more storage than is really available, this allows you to see how over-committed you really are.

athene^®, our capacity management solution, brings in metrics from any data source so storage metrics can be part of the overall capacity management process.

On Friday I’ll be taking a look at Virtual Environments & Clusters.

Dale Feiste

Principal Consultant

Monday, 22 February 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Trending (4 of 10)

One thing to keep in mind for trending is to understand the limitations of linear regression when trending and forecasting data.

I’ve used the graphs below as an example of this.

In the second graph you can see what will happen eventually when that bottoms out or someone goes in and allocates more storage or frees more storage up – it skews the trend line.

Space Capacity – Different Viewpoints

We’ve talked about different viewpoints when looking at your data, reports, trending and now I’m going to look at how useful it is to look at things in Groups.

You can group by Business, Application, Host, Storage Array, Billing Tier and what that really boils down to is providing more of a business or application view.

Below you can see this has been grouped to provide a commercial/business and a technical view. Application owners can go in and see how much storage they are consuming, particularly useful if you also include billing information.

Join me again on Wednesday when I’ll be discussing Host Metrics and don't forget to register now to come along to our next webinar 'Maturing the Capacity Management Process' http://www.metron-athene.com/services/webinars/index.html

Dale Feiste

Principal Consultant

Friday, 19 February 2016

Key Metrics for Effective Storage Performance and Capacity Reporting - Space Utilization (3 of 10)

What does storage ‘Utilization’ mean in your environment?

Utilization can be a variable definition and there are many factors to take in to account, these include RAID/DR, Raw/Configured, Host/SAN, Backups, Compression, Etc...

The term utilization can depend on whether you are including any of these factors and it is useful to know exactly what you wish to include and report on when determining whether you have under or over-utilized storage capacity.

Occupancy – Visibility

Once you have defined what you wish to include in your reports you can start collecting the data.

The chart below illustrates space used on a file system and is a regular trend chart with a threshold, as you can see moving out in to the future it is going to exceed the threshold. You can use trending to report on a number of metrics but when an application is going to run out of space it is going to be at this level.

It’s advisable to be pro-active with trending to ensure that you can deal with any problems before they turn in to real performance problems.

Technical solutions can then be implemented to optimize storage space management, including databases.

On Monday I’ll be looking at Trending and Groups.

Dale Feiste

Principal Consultant