Metron - Capacity Management: March 2015

Monday, 30 March 2015

VMware vSphere – avoiding an Internal Storm (1 of 10)

Traditionally, within the Distributed Computing world single or multiple applications would be hosted on single physical servers, each with an operating system (typically Windows or UNIX/Linux). Then Virtualization was reborn into the x86 environment (note to my Mainframe friends that we know Virtualization was first born in the zOS environment) that allowed for multiple "virtual systems" to be hosted on a single physical server by using hypervisor software. As virtualization software developed further, notably by VMware who are currently the market leader in x86 virtualization technology, we are now able to cluster virtual systems together to create shared pools of resources across the virtual infrastructure.

Why is this important?

Virtualization underpins Cloud Computing by presenting and controlling computing resources to users (or clients) by these shared pools of resources (Resource Pools). However, it is not just the ability to provide resources and control usage, Virtualization also provides two key components of Cloud Computing:

· Autonomic Computing

· Utility Computing

vSphere incorporates Autonomic Computing by automating the control of functioning computer applications and systems. Using vMotion and DRS, it can automate the migration of virtual machines to alternate ESX hosts within the same cluster, if a specific ESX host becomes unbalanced due to excessive resource demand on that host.

Utility Computing allows Cloud providers to provision computing resources and infrastructure to customers and charge them for their specific usage or chosen configuration at a flat rate.

In this series I’ll be looking at VMware vSphere, how it underpins Cloud Computing and how you can use it to best advantage. I’ll start by examining the definition of Cloud on Wednesday...

Jamie Baker
Principal Consultant

Wednesday, 25 March 2015

What do they really want to know? - Adding value to your reports with automatic interpretation

Probably the best way of adding value to reports is to generate automatically an interpretation of the data that is being presented. This relieves the analyst from the task of modifying the report text so that it matches the information in the charts. The final sections of my blog present the outline of an Automatic Advisor system, intended to facilitate web-based publication of complete performance reports with minimal user intervention.

Interpretation Techniques

Given a chart with its underlying data, it is practical to apply a number of analyses automatically. In most cases, the analysis can result in the automatic generation of an "exception incident", which will be e-mailed to a responsible person or team. Additionally, the performance analyst can specify that reports be generated and published only if certain exception conditions in fact occur. Depending on the circumstances, the results of an automatic analysis can be turned into automatic advice, which gives guidance on actions that should be taken to avoid a potential performance problem or to alleviate an existing problem.

The following list gives examples of types of automatic analysis:

Top N analysis. This analysis can determine the few busiest or most resource-hungry users, devices, Oracle sessions or similar. Simply identifying them is a good start but better is to see their pattern of activity over time.

Mean value versus thresholds. This is a simple and straightforward check that the mean value of a measured data item is not too high or too low. Failure to stay within threshold bounds can be made to generate an exception event.

Proportion of time within threshold ranges. Typically the performance analyst will want to set two threshold levels for the value of certain critical data items - a lower, warning threshold and a higher, alarm threshold. It is straightforward to report automatically on the proportion of the measurements that fall into each of the three ranges - below the warning value (and therefore satisfactory), between the warning and the alarm level, and above the alarm level. This gives valuable information about the relationship between peaks and averages.

Variability around the mean value. A given set of measurements will have a mean value, and each individual measurement will typically be some amount higher or lower than the mean value. It is often useful to categorise the measured value as "fairly constant", "rather variable" or "very variable" based on the proportion of time when the measured values are close to or far away from the mean value. Again, if variability is a concern, this analysis can be made to generate an exception event.

Trended value versus thresholds. A very useful automatic analysis is to determine the date at which the value of a particular metric is projected to exceed a certain threshold, or to reach some other predetermined boundary value (e.g. zero, 100% etc.) An exception can be generated on several different attributes of the trend, for example the fact that it will reach a boundary value or will cross a threshold value on or before a predefined date.

Correlation analysis. Used carefully and with a sensible selection of metrics, Correlation Analysis can identify causal as well as statistical relationships between data values. For example, it is easy to identify UNIX users or Windows processes whose activity has a large effect on total CPU utilisation. Similarly, the analysis can identify particular I/O devices that are associated with important warning metrics such as CPU Wait for I/O Completion

In order for an Automatic Advisor's reports to be accepted, they must be:

Trustworthy - i.e. the conclusions are recognisably correct and are based on firm evidence

Specific - i.e. the recommendations are specific enough to be acted on without the need for further detailed analysis

Understandable - many advice systems in the past have proved more difficult to understand than reading the relevant technical documentation itself.

Based on the types of interpretation outlined it is possible to offer trustworthy, specific and understandable advice about such things as:

CPU upgrades, for example if utilisation thresholds are currently being exceeded, or if trend analysis shows that they will be exceeded soon

Memory upgrades, for example if paging and swapping rates are (or will soon be) high, or if cache hit rates are low

Upgrades or tuning of the I/O subsystem, for example if particular devices are becoming hotspots, or if queuing is becoming a high proportion if I/O service time.

Each of the underlying reports will contain detailed information about the selected aspect of the selected system, including all the interpretation and advice described previously. For any item that is not shown as "happy", these drill-down reports will show trustworthy and specific advice for making it so.

Depending on the size of the installation and the number of systems being reported on, this Summary Status report could be produced at regular short intervals, so giving an effectively continuous summary of the installation's health.

In conclusion producing a good report manually takes a lot of effort and there are a number of psychological factors to consider, in addition to the purely technical ones:

· What are the needs and interests of the intended recipient?

· How can the report be made credible and trustworthy?

A regime of automatic reports with intelligent interpretation can add significant value to the work of a system performance analyst.

The reports can be interesting, credible, trustworthy - and perhaps most important, timely.

The analyst is now free to concentrate on the serious business of maintaining and enhancing the performance that is provided to the people who really matter - the organisation's customers. For details on our Capacity Management solutions and services visit our website http://www.metron-athene.com

Rich Fronheiser
Chief Marketing Officer

Monday, 23 March 2015

What do they really want to know?- Adding value to reports with Automatic Trending

Basic trends

Daily or weekly performance reports are useful aids to performance management. They let you and your colleagues understand what is happening over a relatively short timescale, so that immediate or recent problems can be addressed and rectified.

As mentioned previously, monthly (or less frequent) reports are most useful if they include an element of trending. This gives the significant additional benefit of identifying likely dates in the future before which action will have to be taken in order to avoid potential problems.

This shows two months' worth of data about the total CPU utilisation of a particular server, with a trend line applied.

As shown, the analysis window informs us that the trend line will reach a value of 70% on 17th June. There may be a good reason for wanting to know when the trended daily average CPU utilisation will reach 70%. This is a popular value among performance analysts. It is the typical maximum value beyond which you would not want to run a server processing a critical workload. Although it appears a relatively low value, remember that this trend is based on a daily average, so it allows for some normal variation during the day.

Normally, the best way of determining whether 70% (or any other particular value) is an appropriate cut-off point is to carry out analytical modelling of the system being studied. This will show the performance impact of running at that loading level.

At this point, we are assuming an environment where a chart and its associated trend can be updated automatically. The previous chart covers January and February. Suppose it was generated by an automatic reporting mechanism that produces a new year-to-date report on the first of every month. The next graphic shows the chart that was generated on April 1st, with the trend automatically re-calculated to fit the new March data as well as the previous data.

The analysis window now shows a revised projection for 70% total utilisation - it will not happen until 20th August. Clearly the trend has changed.

Discontinuous trends

Linear trends are extremely useful however, they have a particular shortcoming. Suppose you know that conditions are going to change in a few weeks' or months' time, in such a way as to affect the direction of the trend in a predictable fashion.

Examples of this kind of situation are when you know that a particular new project is going to be rolled out onto the server in question, or that the number of users is going to increase suddenly, or that the server is going to be upgraded.

In these circumstances, it is extremely useful to be able to apply a specific change to the nature of the trend at a future date.

For example, you may want to specify that the slope of the trend is going to increase by a known proportion of its value at that time. This kind of change can be expressed as a "What-If". In the case where an automatic trend line has had a What-If change specified, you want this change to be honoured the next time the chart and its trend is automatically updated.

This shows an example of a What-If change specified to the trend on a particular chart.

It shows the trend suddenly turning upwards at a particular date, perhaps because the implementation of a new project will cause a greater rate of increase in the server workload. As a result, the value of 70% is predicted to occur earlier than originally thought, in fact on 15th May rather than 17th June, as was the case for the trend without the specified change.

Again, if the chart is automatically updated with data for another month, the trend is recalculated, and the existing What-If is automatically applied to the revised trend. This gives a new projected date for the 70% utilisation level.

Clearly, the ability to create and modify trends automatically gives significant added value to the charts, and to the reports that they will be embedded in.

In the conclusion to my blog series I’ll be looking at adding value to your reports with automatic interpretation on Wednesday.

Rich Fronheiser
Chief Marketing Officer

Friday, 20 March 2015

What do they really want to know? – Graphical presentations

Over the years, a great deal of work has been carried out to determine the kinds of graphical presentation that make data most easily understood and here are some practical examples:

Numerical proportions

This shows three ways of displaying the fact that certain workloads are the largest contributors to the total loading on a system. The table of numbers, while true and accurate, is difficult to assimilate quickly. The horizontal bar chart shows the relative magnitudes at a glance, but does not convey the additional information that the elements add up to a particular total. The pie chart shows the relative magnitudes and also conveys the information that the elements account for "everything".

Areas and scaling

Some variables, for example different categories of CPU utilisation, can logically be summed to present a total value. If plotting two (or more) such variables over time, it is good practice to stack the individual values so that this total value is clearly displayed. In the example shown, the measured values are hourly aggregations of what is in fact a continuous variable, namely the CPU utilisation over time.

The fact that the variable is continuous is most clearly brought out by displaying the results as an area graph rather than as stacked bars. Use stacked bars when the values are snapshots made at specific times, for example the number of users logged on at particular times of the day.

If possible, show the results against a fixed vertical scale, rather than accepting whatever automatic default your graphics package determines for you.

There are two reasons for this:

The viewer can see at a glance how much scope there is for a potential increase in the value of whatever is being presented.

If the graph is going to be updated by the use of Automatic Reporting technology, fixed scaling contributes to consistency between one version of a report and the next. This makes it much easier to compare different versions of the same report that have been produced at different times.

Magnitude or Variability?

You may be using the same set of data to emphasise two (or more) different attributes of the measurement in question.

For example, you might want to display a graph of CPU utilisation for at least two different reasons:

To show how large (or small) the utilisation is on average

To show how the utilisation varies over time.

A good rule of thumb is:

To emphasise magnitude, use an area chart

To emphasise variability, use a line chart.

On Monday I’ll be taking a look at automatic trending.

Rich Fronheiser
Chief Marketing Officer

Wednesday, 18 March 2015

What do they really want to know? – Automatic reporting

A performance analyst has to carry out the following sequence of actions:

· Write the outline of the report

· Obtain the relevant data from whatever sources are available

· Create graphs and tables of the data for the required period of time

· Insert these graphs and tables into the report document

· Dispatch the finished report to its intended recipient.

These activities can be time-consuming and tedious, especially when the only significant difference between one report and the next is the name of the server that it relates to, or the period of time that it covers.

However, notice that all those different kinds of reports have a number of common features that make them ideal candidates for automation. They are:

A regular production date. For example, a daily report will be produced at 9 am each day to display the previous day's data. A weekly report will be produced every Monday. A monthly report will be produced on the first of every month, and so on.

A consistent analysis period. The analysis period is the period of time that the graphs in a given report cover - a day, a week, a month, the year to date and so on. A common requirement for an analysis period is to go back some number of days, weeks or months from the date of report production.

A known, stable recipient list. Each report will be sent to a named individual or team, or is intended for saving in a "well-known location", for example a particular folder on a Web server.

Reports can be distributed to your audience in a number of ways but try to observe some common rules to engage your audience:

A common format or house style. Most people react well to having the same kind of information presented in the same way each time. This applies to:

- The sequencing of the report contents

- The appearance of the graphs and tables

- The means of transmission (via e-mail, on portal, etc).

If suitable automation is available, it means that in order to produce a regular report, the performance analyst only needs to carry out the following actions once:

Write an outline of the report. Ideally the outline should contain mostly "boilerplate" text that is not going to change from one issue of the report to the next, though of course the analyst may want to edit the text to match the graphs that are actually generated on any particular occasion.

Create "sample" graphs and tables, from existing data, to illustrate the report. The graphs should tell the story in the most understandable way and the next point expands upon this.

Specify a schedule of when the report is to be updated, for what period of time, and who is to receive it by what means of transmission.

Over the years, a great deal of work has been carried out to determine the kinds of graphical presentation that make data most easily understood and I’ll be looking at some practical examples of this on Friday.

Rich Fronheiser
Chief Marketing Officer

Monday, 16 March 2015

What do they really want to know? - not all presentations are face-to-face

As I previously mentioned not all presentations are face-to-face and the act of e-mailing a report to your boss or having them visit a portal is still a "presentation", even though you are not there in person when they read it.

In many ways the requirements for clarity and appropriateness of this type of report are much higher than if you present it in person, because the report has to speak for itself in every way - it cannot get any more assistance from you, the author.

Automatic reporting is a practical technology that is appropriate for many routine presentations and further value can be added to automatic reports if the information on the graphs can be summarised and interpreted in plain English.

Even more benefit can be obtained if the interpretations are used to trigger exception events, for example to warn automatically that a critical system will become overloaded in some number of months' time unless corrective action is taken soon.

Many performance analysts are tasked with producing regular reports on the performance of one or several servers (possibly including mainframes) for which they are responsible. These reports are often, but not necessarily, in the form of Word or HTML documents or on-line screens containing annotated graphs. They fall into a number of categories:

Near-real-time reports. Examples of these kinds of reports include continually updated graphs or charts that are posted to an intranet web site, accessible through a standard browser. Each new data sample will cause a new point to be displayed on the chart. These reports are rarely or never printed out as hard copy.

Daily reports. Data is typically presented at relatively high resolution, with data samples every two to five minutes. These reports are intended to convey detailed information about recent events. This type of report, and the following ones, may be distributed as hard copy, or (more likely) as an e-mail attachment.

Weekly reports. These are presented at a lower resolution, perhaps with one data point for each aggregated hour, or possibly with just one aggregated point per day.

Monthly reports. The graphs in a monthly report will typically show one aggregated point per working day. A month's worth of data is usually the least amount that can be used for trending purposes.

Year-to-date reports. The graphs in a year-to-date report will be at the lowest level of resolution, certainly with no more than one point per day and more likely with one point per week. Depending on the particular measurement(s) being reported on, the primary objective of such a report is to display trend information.

Automatic reporting has its dangers as well as its benefits. How many reports are produced, pinned to the wall or posted on the intranet, and never looked at again?

In order to make reports relevant, interesting and useful, think about the following.

Top N reporting. Concentrate on the few busiest nodes, or devices, or users. Ensure that your automatic reporting application (if you use one) can identify the Top N instances itself, even if they are different each time, without any intervention from you.

Filtering. Ensure that reports are produced for periods of time that are important to the business. If you are in a 24*7 environment, then all times are important. If your organisation works a 5-day week and even lets you off for public holidays, then the non-working days should be filtered out of the reports.

Correlation. What are the key resource drivers? Which particular activities have the biggest effect on the total pattern of system loading? In many cases, correlation analysis lets you predict large-scale performance changes caused by relatively small changes in the nature of the workload or in user behaviour.

Exception reporting. If you have 100 nodes in your installation, do you really want to report on all of them? Much more likely, you only want a report to be produced if some kind of exception condition is detected.

A good Automatic Reporting and Automatic Advisor regime will incorporate all these facilities.

I’ll be looking at outlines for Automatic Reporting on Wednesday but don't forget to register for our Essential Reporting for Capacity & Performance Management webinar http://www.metron-athene.com/services/training/webinars/index.html

Rich Fronheiser
Chief Marketing Officer

Friday, 13 March 2015

What do they really want to know? - Guidelines for a trustworthy presentation

I recommend the following guidelines when you are constructing or delivering a technical presentation.

Start at the beginning. Before proceeding to the unknown, take time to describe what is already understood. This immediately establishes your credibility with the audience - in their eyes, you are demonstrating that you know what you are talking about.

Keep focussed. Eliminate any unnecessary details that do not have a direct impact on the information being presented. Don't run the risk of "glazed eye" syndrome, especially if you are presenting to a group. At the end of the presentation, there will probably be a discussion.Peer-group pressure makes it likely that everyone will feel the need to participate in this discussion and if anyone lost their way during the presentation, that person's contribution is likely to be negative.

Be precise. Ensure that the conclusions can be logically derived from the supporting material. Phrases like "tend to imply" or "may have a bearing on" might be suitable in a government-sponsored report on public transport, but have no place in a technical presentation.

And what if you’re presenting contentious or unpleasant results?

The nature of a surprise is that people react unpredictably when confronted with one and what happens next depends on whether the surprise is pleasant or unpleasant.You may find yourself in the fortunate position of being the bearer of good tidings. For example:

"Our website has taken far more hits than expected and the server is performing really well".

"Last week's memory upgrade solved all our performance problems and the users are very happy".

"IBM has halved the price of their disk drives and we're going to be able to install twice as many as we budgeted for".

But how often is the converse true?

"Our web server is overloaded and our customers are deserting us in droves".

"The memory upgrade made no difference. Our clients are threatening to sue".

"The lead time on hardware delivery has slipped to six months and we're already nearly out of disk space".

No one wants to hear any kind of bad news for the first time during a formal presentation, especially if other people are present and if the news is bad enough, the well-known psychological phenomenon of Denial comes into play.

"Denial" involves saying to oneself "This is so bad that I don't want to believe it, so I will make up evidence to convince myself that it can't be true". The worse the pain, the more compelling the imaginary counter-evidence becomes.

If your information comes as no surprise, the "denial" reaction is unlikely to occur and provided all other attributes of a trustworthy presentation are in place your report will be, however reluctantly, accepted.

Recall that a good and trustworthy presentation starts from what is known, and proceeds towards its conclusion in logical steps. So, if possible, you should ensure that the bad situation you are reporting on is already part of the common fund of accepted knowledge before you start discussing it.

There is no pre-ordained way of doing this but you may be able to take advantage of the informal hierarchy that exists in any medium to large organisation. This informal hierarchy exists alongside, and largely independently of, the formal reporting structure.

By planting a word in the right ear, you can ensure that everyone from the managing director to the car park attendant will soon know what you want them to know.

Not all presentations are face-to-face. The act of e-mailing a report to your boss is still a "presentation", even though you are not there in person when they read it .
I’ll be looking at how to approach this next.

Rich Fronheiser
Chief Marketing Officer

Wednesday, 11 March 2015

What do they Really Want to Know? - Pressures and Priorities

Previously I was talking about the pressures that senior management may face and it would be true to say that most of the pressures on managers originate from their own superiors, who are possibly even more pressed for time and even less technically literate than they are.

In order to keep superiors happy your boss probably has to juggle a conflicting set of priorities and won’t react well to you trying to redefine them.

So don't, for example, spend a lot of time explaining how important the conclusions of your report are. What is deeply important to you may be of only marginal interest to him.

Most of us have what I will refer to as a "domain of competent understanding". For example, you may have detailed knowledge of how a particular application works, or how to configure a Windows network, or how to tune an Oracle database. Only very rare individuals will have all these skills (or more).

Your boss's domain of competent understanding is unlikely to include your own domain as a subset so this means that unless you are careful, you could overwhelm your target audience with information that they are not capable of relating to.

The defence reaction of someone who is overwhelmed by the contents of a report is usually to latch on to some relatively minor ambiguity or inconsistency in an effort to discredit the whole.

Don't fall into the trap of making this possible.

Establish a bond of trust. Emotionally, your audience needs to trust what you are telling them and they will do this much more readily if you make your presentation in terms that they can understand.

They need to trust you because they may subsequently need to pass the relevant information up to a higher level, and only the very brave, or the foolhardy, will stake their reputation or credibility on anything that they don't fully understand. This is especially true if the information that you are trying to convey has significant financial implications.

Therefore, you need to organise your material in such a way that the audience is psychologically capable of trusting it.

On Friday I’ll go through some guidelines for delivering a trustworthy presentation, in the meantime sign up and come along to our next webinar Essential Reporting for Capacity and Performance management http://www.metron-athene.com/services/training/webinars/index.html

Rich Fronheiser
Chief Marketing Officer

Monday, 9 March 2015

What do they really want to know? – Presenting technical reports to management

Many people in the IT industry need to produce reports and make presentations, sometimes to their technical colleagues and sometimes to more senior people.

In this blog series I’m going to be discussing the best ways to present technical information, such as we use for capacity management, to higher levels of management.

You’ll need to take account of the intended recipient's knowledge, interest and preconceptions and when it’s necessary to present results that have significant consequences for the organisation, it’s good psychology to prepare the ground in advance ( perhaps by "leaking" the more contentious items so that they become part of the accepted pool of knowledge)

Depending on the circumstances, there’s much to be gained by regular reporting in a known and accepted format so I’m also going to discuss the extent to which automation of report publishing is both desirable and practical and look at the automatic interpretation of computer performance data and use of dashboards.

I’ve had many years' experience of delivering technical presentations and of being on the receiving end of other people's efforts. As in all fields of endeavour, there have been both successes and failures. So my first objective is to share some of my understanding of what makes a successful technical presentation.

These days, everyone works under pressure.

We all have to face deadlines, handle difficult clients, and cover for sick colleagues. We wrestle with unfamiliar software, fix bugs to keep the critical applications running, and risk our lives travelling to and from the office every day. Sometimes, when the stars and the dice are right, we find some time to do our real jobs.

The pressures faced by senior management may be different in kind but are certainly not different in degree - indeed, they are likely to be fiercer. These pressures will affect the way in which your boss reacts to information from "down below".

In order to make a successful presentation, especially one with financial implications, you should learn as much as you can about the target audience. You should try to understand the pressures they are under, their priorities and their technical awareness.

On Wednesday I’m going to look at these in more detail, sign up for our next webinar Essential reporting for Capacity and Performance Management http://www.metron-athene.com/services/training/webinars/index.html

Rich Fronheiser
Chief Marketing Officer

Monday, 2 March 2015

Workload Profiles, Scorecards & Dasboards - Key Metrics for Effective Storage Performance and Capacity Reporting(10 of 10)

As mentioned previously the Read/Write metrics can help you to get a handle on your workload profiles.

Application type is important in estimating performance risk, for instance, something like Exchange is a heavy I/O user. I’ve also seen examples where virtual workstations were being installed and resulted in a large I/O hit that could have impacted other applications sharing storage.

Scorecards

This is an example of a score card, where you can have a large amount of information condensed in to one easy to view dashboard.

Dashboards

An example below is how you can set up a dashboard and bring key trending and standard reports to you all in one place.

Trending, forecasting, and exceptions with athene®

Storage Key Metrics – Summary

To summarize

• Knowledge of your storage architecture is critical, you may need to talk to a separate storage team to get this information

• Define storage occupancy versus performance

• Discuss space utilization and define

• Review virtualization and clustering complexities

• Explore key metrics and their limitations

• Identify key report types and areas that are most important, start with the most critical.

I hope you've enjoyed this series and if you'd like to listen to a live recording on this subject join our Community at http://www.metron-athene.com/_downloads/on-demand-webinars/index.asp

Dale Feiste

Principal Consultant