Monday 24 January 2011

Too many servers not enough eyes - where did all these servers come from?! (5 of 9)

Linear trending, when applied against metrics where trending makes sense and in combination with effective filtering, reporting, and alerting, can be used to reduce the number of servers and applications down to a more manageable size where detailed analysis and modeling is required.
Typically, most problems with linear trending have to do with how trending is used by the analyst, and the way that the data is chosen and put into a trend set. It may be beneficial to talk a bit about trending and setting up an effective trend set.
Choosing poor metrics for trending can doom the effort of setting up trend reports and trend alerts.
For example, trending the amount of file space free on a physical or logical disk can be a useful task in many cases (such as on a volume that stores customer records in a database); however, if large chunks of data are moved to and from the disk at irregular intervals, the trended value could be fairly meaningless – the file system could be almost full but because of the irregular nature of the moving of data, the trend might be downward.
For example, at a company I used to work for, the capacity planning group had a file system dedicated to storing performance data. The data collection technology we used was supposed to clean up the file system after the data had been transferred back to a central data repository.
When the routine didn’t clean up the data adequately, the file system would fill and an analyst would have to manually clean up the data from the remote systems.
There would be no way to build a useful trend set for such a process and alert based on trends, as the file system would fill within a week and once cleaned up, would be almost entirely empty.
Many metrics like this that are used by some analysts in trend reporting and analysis are better served by real-time or performance alerts with meaningful thresholds set to alert administrators and analysts of potential poor conditions.
Looking back at the company from a decade ago, trends were built on CPU Utilization for all of the systems on the floor. Many of these systems had applications that ran batch jobs during the night. Yet the analysts still put together graphs and charts indicating when a trended CPU Utilization number exceeded a certain threshold. On systems running large amounts of batch (or backup) work, this activity is meaningless at best, misleading at worst. Typically, during the batch window, management and analysts are more interested in throughput and the length, in time, of the batch window.
Rather than trending CPU Utilization, trending the length of the batch (or backup) window would be a more useful measure.
Since performance concerns dictate that batch (or backup) jobs run during the night complete before interactive, transaction-processing work begins in the morning, knowing when those batch jobs will interfere with other users is quite useful – and an appropriate use of trending.
To download the full version of this blog visit http://www.metron-athene.com/_downloads/_documents/papers/too_many_servers.pdf

Rich Fronheiser
Consultant

No comments:

Post a Comment