Metron - Capacity Management: April 2016

Monday 25 April 2016

CPU Breakdown and Summary - Top 5 Key Capacity Management Concerns for UNIX/Linux (12 of 12)

Below is an athene ® example chart displaying the kind of CPU Usage breakdown (System + User = Total) reporting you should be doing for your systems.

Some questions:

What is the breakdown? Is user CPU higher? Should it be? If it’s not what is happening on the system to generate such greater CPU usage?

I/O Response Time

Another example report is shown below, this time showing I/O response times on UNIX/Linux disks.

This example is from a SUN Fire 280. But because of the large numbers of disks, we will want to filter on a Top N (5) basis to identify the key disks which could be experiencing performance problems.

The remainder of the disks are then aggregated as shown in Pink. This in effect produces a cumulative picture of disk performance.

In Summary

I'll leave you with a summary of my series.

UNIX/Linux systems - are a well-established part of any data center, hosting applications and database alike.

UNIX and Linux virtualization - is commonplace within organizations providing the flexibility to host many virtual machines and also to underpin Cloud Computing services such as IaaS, PaaS and SaaS.

Monitor and analyze the data - but be aware of what it is telling you. If it doesn’t add up then it doesn’t add up. Have you got the right packages installed, e.g. sysstat?

Linux popularity is ever increasing - GUI driven, a similar model to Windows, supported on x86 virtualization, gone from SMBs to multi-national organizations and installed on supercomputers. Also found on a range of hardware from mobile/cell phones to aircraft entertainment systems.

UNIX/Linux systems support Big Data implementations - by using HDFS (Hadoop) software and its Map/Reduce functionality. Many individual systems with local block storage storing the data via Map and reported on via Reduce.

Understand the technology – we need to have a good understanding of the technology to be able to perform effective Capacity Management.

Identify what you need to monitor and analyze - get the business information, and predict future usage.

For more on Unix and Linux Capacity Management join our webinar this Wednesday 27th April.

http://www.metron-athene.com/services/webinars/index.html

Jamie Baker

Principal Consultant

Friday 22 April 2016

Key Capacity Metrics - Top 5 Key Capacity Management Concerns for UNIX/Linux (11 of 12)

What key capacity metrics should you be monitoring for your Big Data environment? My list includes some key CPU, Memory, File System and I/O metrics which can give you a good understanding of how well your systems are performing and whether any potential capacity issues can be identified.

• Standard CPU metrics

o utilization, system/user breakdown

• Memory

o Usage, Paging, Swapping

• User/Process breakdown – define workloads

• File System

o Size

o Number of Files

o Number of Blocks

o Ratios

o User breakdown

• I/O

o Response time

o Read/Writes

o Service times

o Utilization

By capturing the user/process breakdown on your UNIX/Linux systems, we can start to define workloads and couple that with the predicted business usage to produce both baseline and predictive analytical models.

Some of the following key questions can then be answered:

• What is the business usage/growth forecast for next 3, 6, 12 months?

• Will our existing infrastructure be able to cope?

• If not what will be required?

• Are any storage devices likely to experience a capacity issue within the next 3,6,12 months?

• Are any servers or storage devices experiencing any performance issues and what is the likely root cause?

This is not an exhaustive list, but it does provide information on the key capacity metrics you should be monitoring for your Big Data environment.

In my final blog on Wednesday I'll be looking at CPU breakdown and summarizing. In the meantime sign up to our Community for access to some of our great Capacity Management resources such as white papers and on-demand webinars http://www.metron-athene.com/_resources/published-papers/login.asp

Jamie Baker
Principal Consultant

Wednesday 20 April 2016

What should we be monitoring? - Top 5 Key Capacity Management Concerns for UNIX/Linux (10 of 12)

Following on from my previous blog on Big Data this is relatively new technology and therefore knowledge around performance tuning is immature. Our instinct tells us that we monitor the systems as a Cluster, how much CPU and Memory is being used with the local storage being monitored both individually and as one aggregated piece of storage. Metrics such as I/O response times, file system capacity and usage are important, to name a few.

What are the challenges?

Big Data Capacity Challenges

So with Big Data technology being relatively new with limited knowledge, our challenges are:

• Working with the business to predict usage - so we can produce accurate representations of future system and storage usage. This is normally quite a challenge for more established systems and applications so it we have to bear in mind that getting this information and validating it will not be easy.

• New technology - limited knowledge around performance tuning

• Very dynamic environment - which provides the challenge to be able to configure, monitor, and track any service changes to be able to provide effective Capacity Management for Big Data.

• Multiple tuning options - that can greatly affect the utilization/performance of systems

What key capacity metrics should you be monitoring for your Big Data environment?

Find out in my next blog and ask us about our Unix Capacity & Performance Essentials Workshop.

http://www.metron-athene.com/services/online-workshops/index.html

Jamie Baker

Principal Consultant

Monday 18 April 2016

Big Data Concerns - Top 5 Key Capacity Management Concerns for UNIX/Linux (9 of 12)

So what is Big Data?

Big Data - Data sets whose size grows beyond the management capabilities of traditional software that has been used in the past.

Vast amounts of information - are now stored specifically when dealing with social media applications, such as Facebook and Twitter. Therefore the Big Data solution needs to provide support for Hexa- and Peta-bytes of data.

Hadoop (HDFS) - Once such solution is Apache’s Hadoop offering. Hadoop is an open source software library project administered by the Apache Software Foundation. Apache defines Hadoop as “a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.” Using HDFS, data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster by auto-replication. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing.

Map/Reduce- A general term that refers to the process of breaking up a problem into pieces that are then distributed across multiple computers on the same network or cluster, or across a grid of disparate and possibly geographically separated systems (map), and then collecting all the results and combining them into a report (reduce). Google’s branded framework to perform this function is called MapReduce.

Not recommended for SAN/NAS - Because the data is distributed across multiple computers whether they are on the same network or disparately separated, it is not recommended to use SAN or NAS storage but local system block storage.

So what should we be monitoring? I'll deal with this on Wednesday.

Don't miss our 'Unix & Linux Capacity Management' webinar either http://www.metron-athene.com/services/webinars/index.html

Jamie Baker
Principal Consultant

Friday 15 April 2016

Cloud Concerns - Top 5 Key Capacity Management Concerns for UNIX/Linux(8 of 12)

To understand our Capacity and Performance concerns around cloud, we need to fully understand what the cloud is and its relation to UNIX/Linux systems. As mentioned previously cloud is underpinned by virtualization due to the ability to provide virtualized systems rapidly, providing services such as Capacity on Demand and then charging accordingly.

Cloud (Internal, External, Hybrid)

• Rapid Provisioning

• Metering (Chargeback) Capacity on Demand

• Resource Pooling and Elasticity

Hybrid popularity is growing.

At present less secure tasks are being handled in external clouds - The initial use of Hybrid clouds (a mixture of Public and Private) was traditionally to run tasks that were not classed as data sensitive. The use of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) in a Public cloud was very much favored by Test and Development departments to get quick and easy access to resources.

Increasing confidence in security - As security has improved and individuals and businesses confidence in Public clouds increases, the popularity of the Hybrid cloud is growing, providing an extension to business systems as and when required.

How easy is access to data? - How can we apply effective Capacity Management to a Hybrid cloud? We can and should have access to data from within an Internal cloud and also know what applications are running on what systems sharing what resources. Getting the same information from an External cloud provider however is not easy if you can get it at all.

Would we want to and do we care? - Does the cloud provider give guarantees for availability, capacity and performance?

SLA’s and appropriate penalties if breaches occur - Do you have or need to have any Service Level Agreements in place? If so, what penalties are in place in terms of a breach or number of breaches?

Hopefully that gives you food for thought.

I'll be looking at another hot topic next time, Big Data - what it is and what the concerns are. We've got some great webinars available to download on these and other capacity management topics, so sign up to access them for free

http://www.metron-athene.com/_resources/index.html

Jamie Baker

Principal Consultant

Wednesday 13 April 2016

Linux differences - Top 5 Key Capacity Management Concerns for UNIX/Linux (7 of 12)

Following on from the development and introduction of UNIX and also the creation of the GNU General Public License (GNU GPL), in order to spread software freely, many of the programs required in an OS (such as libraries, compilers, text editors, a UNIX shell, and a windowing system) were completed by the early 1990s, but few elements such as device drivers, daemons, and the kernel were incomplete.

In 1991, Linus Torvalds began to work on MINIX, a Unix-like OS originally developed for academics, whose code was freely available under GNU GPL project. After which the first LINUX kernel was released on 17 September 1991, for the Intel x86 PC systems. This kernel included various system utilities and libraries from the GNU project to create a usable operating system. All underlying source code can be freely modified and used.

Linux is great for small- to medium-sized operations, and today it is also used in large enterprises where UNIX was considered previously as the only option. A few years ago, most big enterprises where networking and multiple user computing is the main concern, didn't consider Linux as an option. But today, with major software vendors porting their applications to Linux, being freely distributed and being installed pretty much on anything from mobile cell phones to supercomputers, the OS has entered the mainstream as a viable option for Web serving and office applications.

Both the main commercial virtualization offerings in VMware’s vSphere and Microsoft’s Hyper-V both support Linux as an operating system and along with Windows is the most popular virtualized operating system within organizations.

Its popularity has increased to include a user base estimated to be in excess of 25 million because of its application in embedded technologies and because it is free and easily available.

I'll be covering Cloud Concerns next time as to understand our Capacity and Performance concerns around cloud, we need to fully understand what the cloud is and its relation to UNIX/Linux systems.

Jamie Baker
Principal Consultant

Monday 11 April 2016

Data -Top 5 Key Capacity Management Concerns for UNIX/Linux (6 of 12)

To produce and analyze the recommended performance reports we need to capture and store performance data. This process is normally performed by installing an agent responsible for running UNIX/Linux system tools such as SAR, VMSTAT and IOSTAT to capture information or running (potentially intrusive) kernel commands.

As with any data capture whether it is local or remote you’d expect to incur some overhead. Typically agents should incur no more than 1% CPU usage when capturing data, however as mentioned some agents may incur more.

In addition when capturing data, can you rely on what it is reporting? Remember this is software and software can contain bugs. But you say ”we have to rely on what the operating system gives us”, and this is true to some extent. From my experience there are several tools to provide this information within the UNIX operating system – some are accurate and some are not.

For example: Does your Linux system have the sysstat package installed and is it an accurate and reliable version? Or in Solaris Containers are the Resident Set Size (RSS) values being incorrectly reported due to a double counting of memory pages? An example of this is shown below.

Zone Memory Reporting

This report is an interesting one. It shows the amount of RSS memory per zone against the Total Memory Capacity of the underlying server (Red).

Because RSS values are double counted the sum of RSS for each zone far exceeds the actual Physical Memory Capacity.

I'll be looking at Linux differences next, in the meantime don't forget to register for our 'Unix & Linux Capacity Management' webinar http://www.metron-athene.com/services/webinars/index.html

Jamie Baker
Principal Consultant

Friday 8 April 2016

Performance concerns - Top 5 Key Capacity Management Concerns for Unix/Linux (5 of 12)

Just think about this for a moment – it’s just you and your workstation. No contention. Introduce a server which hosts one or many applications and can be accessed by one or many users simultaneously and you are likely to get contention. Where contention exists you get queuing and inevitably response times will suffer.

With UNIX/Linux virtualization we have two main concerns:

1) Will virtual systems hosted on shared hardware (via Hypervisor) will impact one another?

2) What additional overhead and impact on performance does requesting hardware resources through a hypervisor cost?

To answer question 1, we have to understand what resources each virtual machine is requesting at the same time. So it's prudent to produce performance reports for each hosted VM.

Do they conflict with each other or complement each other? There are virtualization tools that have been developed to keep VMs apart if they conflict and together if they complement to improve performance.

To answer question 2, initial software virtualization techniques such as Binary Translation allowed for existing x86 operating systems to be virtualized. The hardware call request had to go via the hypervisor thus adding on an additional overhead, somewhere between 10-20% on average. As time progressed a combination of hardware virtualization and paravirtualized operating systems have reduced overhead and improved virtualized application response times.

Paravirtualization is built in to all recent Linux kernels. For those that don't know what paravirtualization is it's a virtualization technique that presents a software interface to VMs that is similar, but not identical to that of the underlying hardware. The intention is to reduce the guest's execution time spent performing operations which are substantially more difficult to run in a virtual environment compared to a non-virtualized environment. It provides specially defined 'hooks' to allow the VMs and underlying host to request and acknowledge these tasks, which would otherwise be executed in the virtual domain (where execution performance is worse). A successful paravirtualized platform may allow the virtual machine monitor (VMM) to be simpler (by relocating execution of critical tasks from the virtual domain to the host domain), and/or reduce the overall performance degradation of machine-execution inside the virtual-guest.

To produce and analyze the recommended performance reports we need to capture and store performance data and I'll be discussing this on Monday.

http://www.metron-athene.com/_resources/index.html

Jamie Baker

Principal Consultant

Wednesday 6 April 2016

Top 5 Key Capacity Management Concerns Unix/Linux - VM Principles (4 of 12)

Continuing on today with a look at VMware principles. The following principles of virtual machines hold true across all virtualized environments provided within the IT industry.

• Isolation

– Own root (/), processes, files, security

• Virtualization

– Instance of Solaris O/S

• Granularity

– Resource allocation, Pools

• Transparency

– Standard Solaris interfaces

• Security

– No global reboots, isolated within each zone

On Friday I'll look at performance concerns.

Don't forget to register for our 'Unix and Linux Capacity Management' webinar http://www.metron-athene.com/services/webinars/index.html

Jamie Baker

Principal Consultant

Monday 4 April 2016

Top 5 Key Capacity Management Concerns for Unix/Linux - Use of Internal/ External/ Hybrid clouds (3 of 12)

Cloud technology, whether the resources are internally or externally provided, is underpinned by virtualization. Virtualization provides the 3 following minimums for cloud services.

• On demand self-service

• Resource Pooling

• Rapid Elasticity

Virtualization

• Why virtualize? It’s an important question and one that shouldn’t be taken lightly. There are many benefits as I have already mentioned about virtualization, but it shouldn’t be taken as a broad brush approach. In my experience it is better to perform some kind of pre-virtualization performance analysis before actually committing to virtualizing an application, even though the vendor tells you it is ok! I am a big advocate of virtualization as it provides great flexibility, reduces hardware and associated costs, and allows you to manage your estate from a single pane of glass.

Commercial Vendors

– IBM, HP, Oracle, VMware, Microsoft

Specifically looking from a Capacity and Performance viewpoint, virtualizing applications in some instances can prove to be detrimental to performance. Even though virtualization allows for multiple virtual systems to be hosted, those hosts are still physical servers which have finite capacity and I’ll cover this later on.

Open source (Linux)

– KVM

– Xen

Underpins cloud technology - As mentioned previously, virtualization underpins Cloud technology. Cloud services can be provided through demand resourcing, flexibility and quick provisioning. There are many more cloud services now available than the traditional Infrastructure, Platform and Software.

– IaaS, PaaS, SaaS

Oracle Enterprise Linux - Oracle’s offering in the UNIX/Linux space is Oracle Enterprise Linux, which is a clone of Red Hat Enterprise Linux and Solaris after their acquisition of SUN. Both operating systems provide virtualization offerings, however the Linux coverage is growing quite substantially with its support for tablets and phones and database coverage.

The topic of cost differences for Linux over Solaris or AIX is a whole paper in itself. Both RHEL and Solaris are free and open source and are supported on x86 architecture, however entry point x86 hardware is cheaper than ultraSPARC hardware, unless you have the capacity requirements for ultraSPARC entry offerings.

On Wednesday I'll be looking at virtual machine principles. In the meantime don't forget to sign up for our 'Unix & Linux Capacity Management' webinar http://www.metron-athene.com/services/webinars/index.html

Jamie Baker

Principal Consultant