Metron - Capacity Management: January 2015

Wednesday, 7 January 2015

CPU Breakdown and Summary - Top 5 Key Capacity Management Concerns for UNIX/Linux (12 of 12)

Below is an athene ® example chart displaying the kind of CPU Usage breakdown (System + User = Total) reporting you should be doing for your systems.

Some questions:

What is the breakdown? Is user CPU higher? Should it be? If it’s not what is happening on the system to generate such greater CPU usage?

I/O Response Time

Another example report is shown below, this time showing I/O response times on UNIX/Linux disks.

This example is from a SUN Fire 280. But because of the large numbers of disks, we will want to filter on a Top N (5) basis to identify the key disks which could be experiencing performance problems.

The remainder of the disks are then aggregated as shown in Pink. This in effect produces a cumulative picture of disk performance.

In Summary

I'll leave you with a summary of my series.

UNIX/Linux systems - are a well-established part of any data center, hosting applications and database alike.

UNIX and Linux virtualization - is commonplace within organizations providing the flexibility to host many virtual machines and also to underpin Cloud Computing services such as IaaS, PaaS and SaaS.

Monitor and analyze the data - but be aware of what it is telling you. If it doesn’t add up then it doesn’t add up. Have you got the right packages installed, e.g. sysstat?

Linux popularity is ever increasing - GUI driven, a similar model to Windows, supported on x86 virtualization, gone from SMBs to multi-national organizations and installed on supercomputers. Also found on a range of hardware from mobile/cell phones to aircraft entertainment systems.

UNIX/Linux systems support Big Data implementations - by using HDFS (Hadoop) software and its Map/Reduce functionality. Many individual systems with local block storage storing the data via Map and reported on via Reduce.

Understand the technology – we need to have a good understanding of the technology to be able to perform effective Capacity Management.

Identify what you need to monitor and analyze - get the business information, and predict future usage.

Webinars that I have hosted on this subject are available to download by joining our Community

http://metron-athene.com/_downloads/index.html

Jamie Baker

Principal Consultant

Key Capacity Metrics - Top 5 Key Capacity Management Concerns for UNIX/Linux( 11 of 12)

What key capacity metrics should you be monitoring for your Big Data environment? My list includes some key CPU, Memory, File System and I/O metrics which can give you a good understanding of how well your systems are performing and whether any potential capacity issues can be identified.

• Standard CPU metrics

o utilization, system/user breakdown

• Memory

o Usage, Paging, Swapping

• User/Process breakdown – define workloads

• File System

o Size

o Number of Files

o Number of Blocks

o Ratios

o User breakdown

• I/O

o Response time

o Read/Writes

o Service times

o Utilization

By capturing the user/process breakdown on your UNIX/Linux systems, we can start to define workloads and couple that with the predicted business usage to produce both baseline and predictive analytical models.

Some of the following key questions can then be answered:

• What is the business usage/growth forecast for next 3, 6, 12 months?

• Will our existing infrastructure be able to cope?

• If not what will be required?

• Are any storage devices likely to experience a capacity issue within the next 3,6,12 months?

• Are any servers or storage devices experiencing any performance issues and what is the likely root cause?

This is not an exhaustive list, but it does provide information on the key capacity metrics you should be monitoring for your Big Data environment.

In my final blog I'll be looking at CPU breakdown and summarizing.

Jamie Baker
Principal Consultant

Monday, 5 January 2015

What should we be monitoring? - Top 5 Key Capacity Management Concerns for UNIX/Linux (10 of 12)

Following on from my previous blog on Big Data this is relatively new technology and therefore knowledge around performance tuning is immature. Our instinct tells us that we monitor the systems as a Cluster, how much CPU and Memory is being used with the local storage being monitored both individually and as one aggregated piece of storage. Metrics such as I/O response times, file system capacity and usage are important, to name a few.

What are the challenges?

Big Data Capacity Challenges

So with Big Data technology being relatively new with limited knowledge, our challenges are:

• Working with the business to predict usage - so we can produce accurate representations of future system and storage usage. This is normally quite a challenge for more established systems and applications so it we have to bear in mind that getting this information and validating it will not be easy.

• New technology - limited knowledge around performance tuning

• Very dynamic environment - which provides the challenge to be able to configure, monitor, and track any service changes to be able to provide effective Capacity Management for Big Data.

• Multiple tuning options - that can greatly affect the utilization/performance of systems

What key capacity metrics should you be monitoring for your Big Data environment?

Find out in my next blog and don't forget to take a look at our range of online workshops taking place this year.

http://metron-athene.com/services/training/online-workshops/index.html

Jamie Baker

Principal Consultant

Friday, 2 January 2015

Big Data Concerns - Top 5 Key Capacity Management Concerns for UNIX/Linux (9 of 12)

So what is Big Data?

Big Data - Data sets whose size grows beyond the management capabilities of traditional software that has been used in the past.

Vast amounts of information - are now stored specifically when dealing with social media applications, such as Facebook and Twitter. Therefore the Big Data solution needs to provide support for Hexa- and Peta-bytes of data.

Hadoop (HDFS) - Once such solution is Apache’s Hadoop offering. Hadoop is an open source software library project administered by the Apache Software Foundation. Apache defines Hadoop as “a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.” Using HDFS, data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster by auto-replication. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing.

Map/Reduce- A general term that refers to the process of breaking up a problem into pieces that are then distributed across multiple computers on the same network or cluster, or across a grid of disparate and possibly geographically separated systems (map), and then collecting all the results and combining them into a report (reduce). Google’s branded framework to perform this function is called MapReduce.

Not recommended for SAN/NAS - Because the data is distributed across multiple computers whether they are on the same network or disparately separated, it is not recommended to use SAN or NAS storage but local system block storage.

So what should we be monitoring? I'll deal with this next....Happy New Year to you all.

Jamie Baker
Principal Consultant

www.metron-athene.com