Friday, 27 February 2015

Backend Metrics - Key Metrics for Effective Storage Performance and Capacity Reporting(9 of 10)

Below are some back end metrics available on the back end storage array

These are typical performance metrics showing throughput and response times, the type of thing you need to report on regularly so that you can be on top of performance before incidents start being generated. 

Performance Capacity – Array Metrics

The key metrics that you need to get a handle on at volume level are throughput, response and latency.

Below an example of NetApp metrics at volume level

Below an example of metrics within EMC at the volume level

The read/write ratio can give you an idea of what your work profile looks like.

Performance Capacity – Component Breakdown

The example below, using athene, shows a component breakdown for the server.
It’s essential to know whether you have any queuing going on (shown in yellow above) as if queuing is happening you are exceeding the devices throughput rate.

In the final part of my blog series on Monday I’ll take a look at workload profiles, scorecards and dashboards.

Dale Feiste
Principal Consultant

Wednesday, 25 February 2015

Array Architecture and Metrics - Key Metrics for Effective Storage Performance and Capacity Reporting(8 of 10)

Today I said we’d take a look at Array Architecture. This is an example of an enterprise type array comprising of

       Front End Processors

       Shared Cache

       Back End Processors

       Disk Storage

A lot of time these disks can be striped across the entire array, a very large number of spindles tied together to provide a very large resource.

Quite often on these large arrays bottlenecks will occur on the front end processor, requests coming in will queue up there.

Performance Capacity – Array Metrics

As mentioned front end processors are typically the first to bottleneck, below is an example showing just one day.

This is ideal information for trending, if you picked up these processors over a period of time you could do a trend going forward and figure out when and where bottlenecks are likely to occur.

On Friday I’ll be looking at back end metrics. In the meantime sign up to belong to our Community and listen to the live recording of this series

Dale Feiste
Principal Consultant

Monday, 23 February 2015

Performance Capacity – Response Impacts - Key Metrics for Effective Storage Performance and Capacity Reporting (7 of 10)

SAN or storage array performance problems can be identified at the host or backend storage environment.
The diagram below shows a typical performance impact in the more complex environment.

With SAN attached storage you can share storage across multiple servers, one of the downsides of this is that you can have storage response impact across multiple servers too.

Performance Capacity – Host Metrics

It is important that you understand the limitations of certain host metrics.
A selection of host metrics are shown below:

       Measured response is the best metric for identifying trouble.

       Host utilization only shows busy time, it doesn’t give capacity for SAN.

       Physical IOPs is an important measure of throughput, all disks have their limitation.

       Queue Length is a good indicator that a limitation has been reached somewhere.

Performance Capacity – Host Metrics

Metrics like host utilization can indicate impactful events, but ample capacity might still be available.

The high utilization can be seen generating large amounts of I/O in the chart below.

Queue lengths indicate that it may not currently be impacting response, but headroom is unknown. Response time is the key, as users will be impacted if it goes up.

Next time I’ll look at array architecture. 
If you missed our recent webinar on Storage performance sign up for our Community and download or listen for free

Dale Feiste
Principal Consultant

Friday, 20 February 2015

Virtual Environments and Clusters - Key Metrics for Effective Storage Performance and Capacity Reporting (6 of 10)

Managing storage in clustered and/or virtual environments can be challenging because it is shared among all hosts and virtual machines running on it. 

Below is an example of a VMware cluster, just a simple 3 node cluster going to some shared storage.

Features that are available

          Thin provisioning
          Storage can be viewed at many levels.
          Could be different tiers allocated to the same cluster
          Overhead at various points

gives you a good indication of your overhead.

Storage Virtualization

There are advantages to the layered system

       It allows a caching layer so that you may not have to go all the way to the backend to satisfy an I/O request

       There are a lot of administrator features regarding allocation and replication, pooling physical storage from multiple sources into logical groupings is useful

          Can be a centralized source for collecting data

          If using as a data source beware of double counting with backend

There are a wide variety of techniques for virtualizing storage, be aware of the implications for data collection and reporting.

On Monday I’ll be discussing response impacts on performance capacity and metrics for these. 
Follow our blog to get updates sent directly to you.

Dale Feiste
Principal Consultant

Wednesday, 18 February 2015

Host Metrics - Key Metrics for Effective Storage Performance and Capacity Reporting (5 of 10)

Moving on to the metrics, for occupancy the key metric is utilization. How much storage are we using and how much is available?

Below are some host metrics that are typically available, these metrics are available at the file system, volume, or logical disk levels.

A lot of these Storage arrays, from the different vendors, have different ways to carve up the storage.  Storage groups can be configured as in this example, using NetApp aggregates, which can have many occupancy metrics at different levels.

Some of the NetApp occupancy levels here are not available on the host in general.

I’ll pick out a few of the metrics:

De-dupe – If this is turned on you can find out how much space you’re saving

Total Committed space – A lot of vendors now offer thin provisioning where storage can be over-committed so it looks as though there is more storage than is really available, this allows you to see how over-committed you really are.
athene®, our capacity management solution, can bring in metrics from any time series data source so can allow storage metrics to be brought in to the capacity management process.

Next time I’ll be taking a look at Virtual Environments and Clusters. Don't forget to sign up for our next webinar 'Managing by KPI's' on February 18

Dale Feiste
Principal Consultant

Monday, 16 February 2015

Trending - Key Metrics for Effective Storage Performance and Capacity Reporting (4 of 10)

One thing to keep in mind for trending is to understand the limitations of linear regression when trending and forecasting data.

I’ve used the graphs below as an example of this.

In the second graph you can see what will happen eventually when that bottoms out or someone goes in and allocates more storage or frees more storage up – it skews the trend line.

Different Viewpoints

We’ve talked about different viewpoints when looking at your data, reports, trending and now I’m going to look at how useful it is to look at things in Groups.
You can group by Business, Application, Host, Storage Array, Billing Tier and what that really boils down to is providing more of a business or application view.

Above you can see this has been grouped to provide a commercial/business and a technical view. Application owners can go in and see how much storage they are consuming and this is particularly useful if you also include billing information.

Join me again on Wednesday when I’ll be taking a look at Host Metrics. In the meantime why not join our community and get access to our white papers, performance tips and on-demand webinars

Dale Feiste
Principal Consultant

Friday, 13 February 2015

Space Utilization - Key Metrics for Effective Storage Performance and Capacity Reporting (3 of 10)

What does storage ‘Utilization’ mean in your environment?

Utilization can be a variable definition and there are many factors to take in to account, these include RAID/DR, Raw/Configured, Host/SAN, Backups, Compression, Etc...
The term utilization can depend on whether you are including any of these factors and it is useful to know exactly what you wish to include and report on when determining whether you have under or over-utilized storage capacity.

Occupancy – Visibility

Once you have defined what you wish to include in your reports you can start collecting the data.
The chart below illustrates space used on a file system and is a regular trend chart with a threshold, as you can see moving out in to the future it is going to exceed the threshold.
You can use trending to report on a number of metrics but when an application is going to run out of space it is going to be at this level.
It’s advisable to be pro-active with trending to ensure that you can deal with any problems before they turn in to real performance issues.

Technical solutions can then be implemented to optimize storage space management, including databases.
On Monday I’ll be looking at Trending and Groups. Why not take a look at trending and modelling with athene®

Dale Feiste
Principal Consultant

Wednesday, 11 February 2015

Two Distinct Aspects of Storage Capacity - Key Metrics for Effective Storage Performance and Capacity Reporting (2 of 10)

As mentioned there are two distinct aspects of data storage.

Data can come from all different directions to the disk:

Disk occupancy

Disks used to be very expensive but now the costs have come down dramatically and this cost factor has accelerated the growth of storage.
You may have too little storage resulting in out of disk space problems but conversely you may have storage over-allocated. A lot of times people put excessive storage space out there to ensure that they never run out and don’t pay attention to how much they really need and what their growth really is going to be.
Below is a typical service center queuing diagram.

In many cases the requests are being sent out by an application or applications. There is a finite limitation on the requests per second that can be satisfied and then a queue begins to form. The queuing theory comes in to play where you have limitations on the throughput of your I/O and at some point this will have a response impact. The response impact transfers up through the application to the user and results in a slow response time, a performance problem.

On Friday I’ll be looking at space utilization. There's still time to sign up and come along to our webinar tomorrow Proactive storage performance management Register now

Dale Feiste
Principal Consultant


Monday, 9 February 2015

Key Metrics for Effective Storage Performance and Capacity Reporting (1 of 10)

This blog series will cover the key metrics in storage that you can use to get a handle on your storage capacity.

       Storage Architecture – basic concepts

       Two distinct aspects of storage capacity


       Key metrics from the host and backend storage view

       Reporting on what is most important

A good place to start is with the history of storage architecture.

Storage has increased in complexity, as shown in the diagram below from left to right

Large environments have gone from megabytes to petabytes in terms of Storage and this growth can result in an increase in cost and complexity. 

On Wednesday I’ll look at the 2 distinct aspects of storage the meantime feel free to browse our white papers and join our community

Dale Feiste
Principal Consultant