Metron - Capacity Management: April 2014

Monday, 28 April 2014

Conclusions (15 of 15 Data, data everywhere and not a bit to use)

In my last blog I mentioned how difficult it can be to get data and that is often the reality.You may have to make the best of the data that’s already being generated.

So I conclude with a few ground rules to remember:

Be flexible - You will have to take the data from whatever already exists.

Stand Your Ground - Don’t make work for yourself, you don’t have the staff. Make others deliver the data, you keep it.

Introduce APM/BTM tools - The typical missing element. APM tools provide really valuable data to you and the business.

Centralise the data - At capture time. Keep your data in one tool, that way you can be an expert on that not a jack of all trades.

Know your cloud strategy - Cloud is happening. Make sure you know what types are coming your way and get in early with requirements.

Join our Community and download my white paper from the Capacity Management section http://www.metron-athene.com/_downloads/published-papers/index_2.asp

Phil Bell

Consultant

Tuesday, 22 April 2014

In reality how do people get data? (14 of 15. Data, data everywhere but not a bit to use)

Now the reality....

You have requests for data and you don’t have time to learn all these systems to get data out. So what do you do?

Make the experts supply you with the data, after all it’s their SAN/network/application.

It sounds like a simple task but how do you get the data from them?

From other internal teams – Make sure that you set up Processes and reprimands.

From the outsourcer – If that’s not internal it needs to be down on paper before things get signed. Most of the time they will have monitoring tools and you’ll be blocked from deploying yours. So make them use their tools, particularly if that tool was bought partly because “it does capacity planning”. Have it in the contract and enforce it.

You will need a sponsor, someone who can make this happen. Messer's Smith and Wesson are probably not the way to go, but without budget, time OR a sponsor making others get you the data - it’s not looking good out there.

I'll conclude my series on Thursday with a look back at the main points to remember when trying to gather data for capacity management.

Phil Bell

Consultant

Wednesday, 16 April 2014

Combine APM & Resource Data (13 of 15, Data,data everywhere and not a bit to use)

As I mentioned on Monday we’re picking up data from APM such as:

• Transaction times

• Transaction counts

• Transaction type

• End to end

• Per server

In effect all that information you could never get from the business, in one handy location -so what do we do with it now?

Combine your APM & Resource Data.

Given access to monitoring we can look not just at the transactions, response time etc but correlate that with the hardware utilization.

Suddenly we can put a value (in utilization) of an individual transaction.

With all this data coming at us it makes sense that if we're going to use it together we store it together.

CMIS is the acronym for Capacity Management Information System and it definitely is the way to go.

Forget multiple databases, you barely have time to think, yet you’re going to maintain and run multiple databases and associated software to try and tie the data together?

Centralize, combine, aggregate, just stick it all in one place and make your life easy.

Centralize, have a single interface to all data.Get all the data in one place, then you can use it as one entity.

Organise, mirror the organisation, organize it around the services or “whatever” makes sense to your business but preferably don't organize it by tech silo.

Automate, too many capacity planners are still copying data from one place to another every week. Computers are great at performing repetitive tasks, use them.

After Spring break, I'll look at the reality of how people get data. If you'd like to listen to my webinar on this subject it's available to download http://www.metron-athene.com/_downloads/on-demand-webinars/index_2.asp

Phil Bell

Consultant

Monday, 14 April 2014

Problems benchmarking Cloud (12 of 15 Data, data everywhere and not a bit to use)

As I mentioned in my last blog you can’t trust the data from clouds to be what you expect and you can’t trust your existing benchmarks to represent the future.So…what can you do?

The only way to control your own data is to have a private cloud, that way:

• You are in charge and you monitor the hardware utilizations

• The Cloud still has physical limits, and soft “limits”

– Resource Pools, Reservations etc

• Opportunity

– Resource Utilisation and Service Information combined

• Users, Processes, Transactions, Business Volumes

• Challenge

– Business decision based on easy capacity monitoring?

Having a private cloud means you are in charge and can do a really good job of tuning the environment to your requirements to ensure your service levels.

That’s going to be a tricky sell to the business though.

So we get to APM. We’re going to get this data, which is data that’s really hard to get hold of from the business or is new data that the business really wants.

• Transaction times

• Transaction counts

• Transaction type

• End to end

• Per server

In effect all the information you could never get from the business, in one handy location.

We can now look at combining our APM and resource data and L'll talk more about this on Wednesday.

Phil Bell

Consultant

Friday, 11 April 2014

Problems Benchmarking the Cloud ( data, data everywhere and not a bit to use 11 of 15)

Carrying on from my previous blog it is evident that benchmarking the cloud is not an easy task.

After all Clouds evolve.They don't tell you when changes go in, so historical data is not reliable. “commercial clouds such as Amazon EC2 add frequently new functionality to their systems, thus, the benchmarking results obtained at any given time may be unrepresentative for the future behaviour of the system.” Alexandru Iosup, Radu Prodan, and Dick Epema

So why don’t we continually benchmark the cloud? Because it’s complex and expensive (Challenge 1 = how to do it cheaply)

“A straightforward approach to benchmark both short-term dynamics and long-term evolution is to measure the system under test periodically, with judiciously chosen frequencies [26]. However, this approach increases the pressure of the so-far unresolved Challenge 1.” Alexandru Iosup, Radu Prodan, and Dick Epema

Even with lots of data you’ll have a hard time making it fit reality because you cannot replicate all the software involved.

“We have surveyed in our previous work [26], [27] over ten performance studies that use common benchmarks to assess the virtualization overhead on computation (5–15%), I/O (10–30%), and HPC kernels (results vary). We have shown in a recent study of four commercial IaaS clouds [27] that virtualized resources obtained from public clouds can have a much lower performance than the theoretical peak, possibly because of the performance of the middleware layer.” Alexandru Iosup, Radu Prodan, and Dick Epema

Over long term observation the trend was clear(ok the dates are old but this still stacks up) things were slowing up, the cloud experiences seasonality of some description.

“We have observed the long-term evolution in performance of clouds since 2007. Then, the acquisition of one EC2 cloud resource took an average time of 50 seconds, and constantly increased to 64 seconds in 2008 and 78 seconds in 2009. The EU S3 service shows pronounced daily patterns with lower transfer rates during night hours (7PM to 2AM), while the US S3 service exhibits a yearly pattern with lowest mean performance during the months January, September, and October. Other services have occasional decreases in performance, such as SDB in March 2009, which later steadily recovered until December [26].” Alexandru Iosup, Radu Prodan, and Dick Epema

The final nail in the coffin when trying to benchmark the Cloud is the flexibility and shifting nature of the hardware, workloads and software involved.

“Depending on the provider and its middleware abstraction, several cloud overheads and performance metrics can have different interpretation and meaning.” Alexandru Iosup, Radu Prodan, and Dick Epema

So you can’t trust the data from clouds to be what you expect and you can’t trust your existing benchmarks to represent the future.

So…what can you do?

I'll answer this question on Monday.In the meantime why not sign up to our Community and listed to our on-demand webinar 'Cloud Computing and Capacity Management'
http://www.metron-athene.com/_downloads/on-demand-webinars/index_2.asp

Phil Bell
Consultant

Wednesday, 9 April 2014

Data from the Cloud (Data, data everywhere and not a bit to use 10 of 15)

http://xkcd.com/908/

I couldn't resist starting this with a little humor.

These are the basic cloud types and the pitfalls of each when it comes to data:

Public Cloud (Worst Case)

– No control

– You put your faith in the provider

– Monitor response times only?

Private Cloud (Best Case)

– Full control

– You are responsible, but have all the data

Community Cloud (Never seen)

– Potential control

– You are involved and may have access to the data

Hybrid Cloud (Where you’re likely to be)

– Some control

– Full control of the Private Cloud portion only

Well we all know that the world is a complex place, so monitor what you can, and know the limitations of what you can’t.

What happens if you want to Benchmark the Public cloud?

The public cloud is clearly the weak link in terms of monitoring, so I thought I’d get an EC2 account, run some benchmarks and see what kind of workload we can put through this thing.

To quote Jeremy Clarkson(TopGear, UK) “How hard can it be?”

I got a VM up and running to see what workload it could handle.The AWS results were all over the place, I couldn’t get a stable result at all. I didn’t even record the results because it was so far apart I figured it was rubbish.

I then thought 'Somebody else must have looked into this' so I went to the spec.org website to see if there was a specific “cloud” benchmark.

http://www.spec.org/osgcloud/

Unfortunately there isn’t as yet and they have been at it a while!

A Google search turned up a paper by these guys and it matches my experience.

• http://datasys.cs.iit.edu/events/MTAGS12/i02.pdf

IaaS Cloud Benchmarking: Approaches, Challenges, and Experience

Alexandru Iosup, Radu Prodan, and Dick Epema

As you can clearly see there are problems with benchmarking the Cloud and I'll be looking at what these are on Friday.

Phil Bell

Consultant

Monday, 7 April 2014

Using the data (Data, data everywhere and not a bit to use) 9 of 15

Using the data

Once you figure out how you can get the data, take control of it by putting it in your capacity tool.

These are the sorts of things I have been asked to import alongside more traditional data.

• Business / Customer transaction reports (multiple types)

• Open VMS T4 data

• Historical CPU & Memory data from home grown scripts

• NetApp, HP EVA

• IP Pool allocation

• Datacentre temperature & power

It’s an ever growing list and let’s take a look at a more detailed example below, for the purposes of this I’ve chosen NetApp.

If you want to get data out of NetApp Operations manager there is a nice command you can run “dfm data export”. Just set your format, interval length and the duration of the data you want.

– Operations Manager DFM CLI Export

– Occupancy and performance data for all LUNS, Volumes, Aggregates & Systems connected to Operations Manager.

– dfm data export run –d comma –t “5 mins” –f avg –h “1 day”

– Database tables in .csv

– Script to produce something “nicer” to import

I'll be looking at data from the Cloud on Wednesday, don't forget to sign up to our Community in the meantime

http://www.metron-athene.com/_downloads/index.html

Phil Bell

Consultant

Friday, 4 April 2014

SANs ( 8 of 15 Data, data everywhere and not a bit to use)

SANs are an area where I get a lot of queries.

It appears for one reason or another that storage teams can be quite combative when approached looking for access to their world. It therefore never fails to amaze me how few capacity managers know what storage space is available, and how quickly it’s being consumed.

Challenge

– IOPS remains the biggest bottleneck

– Surprising number of capacity managers are unaware of storage capacity available

As I mentioned previously storage seems to be the weak point in the performance chain right now. So where can we get data?

SMI-S (Storage Management Initiative – Standard) - It’s a standard for storage devices that lets you see what they are up to (among other administrative uses)

PowerShell Plugins - Many of the storage vendors have powershell plugins that you can use to retrieve data.Learn PowerShell or learn to serve fries (some dude 2008)

Storage Vendor central control server - my favourite place is usually a central repository of data such as EMC ProSphere or NetApp Ops Manager.

Once you figure out how you can get the data, take control of it by putting it in your capacity tool.

Of course, if you're using a capacity management tool like athene® you can use Capture Packs to bring data in from storage yourself, in fact you can bring time/data stamped data in from just about anywhere in the business.

http://www.metron-athene.com/products/athene/datacapture/capture-packs/index.html

Next week I'll have a look at what other types of data you may wish to bring in to your capacity management tool.

Phil Bell

Consultant

Wednesday, 2 April 2014

Business/Application Transaction data (APM) (7 of 15 Data, data everywhere and not a bit to use)

As I said on Monday a good APM tool will give us a LOT of useful information about the workload in our environment. It’ll also present the data in a way that allows various teams to talk in the same language.

So it will help define what a user transaction is, and where that transaction spends it’s time.

A user action = A transaction

– Log on, Search, Add to Basket, Checkout, Payment = 5 transactions

So what are the benefits, difficulties and issues to avoid when using APM?

Benefits

– Common language

– Service based

– Defined SLAs

– Real workload volumes (Planning benefits)

Usual Difficulties

– No tool capturing this data (see my recommendation at the end of this blog)

– No access to the data held (Typically controlled by Operations)

– No import facility to capacity tool

Avoid

– Exporting data from both tools into Excel and manually cutting and pasting to get combined reports

This is data that is traditionally hard to get hold of. Either it’s simply not collected, or it’s fragmented and hidden by teams who don’t want to lose control of “their toy”. And quite often it’s not been designed with the intention of combining its data with something else, like a capacity tool.

If you get or have an APM tool running the last thing you want to do is spend time exporting everything into excel to combine the data. (88% of spread sheets have errors).

My recommendation, if you haven't got an APM tool currently, is to take a look at SharePath - it monitors in real time and provides you with the real user experience(not synthetic) http://www.metron-athene.com/products/sharepath/sharepath-technology-overview.html

On Friday I'll be discussing SANs, as this seems to be the weak point in the performance chain right now and where we can get that storage data from.

Phil Bell

Consultant