Monday 20 April 2015

Automatic reporting and alerting(10 of 10)

What do we actually need? What do we actually want to report on?  How often do we want to report on it? 

If we are applying threshold based alerting to our reports, we need to ensure that the correct values are set.  These values may be the utilizations or response times stated within an SLA or based on maximum resource usage.  Failure to set the correct values may lead to incorrect alerts being produced, leading to unnecessary investigations, stress and panic.  By including availability and response time information within your capacity reports, you improve both the accuracy and increase confidence in your forecasts whilst providing potential SLA breach information in advance. 

When creating forecast models, whether trend or analytical models or both, it is important to make sure that the inputs into your model are as accurate as possible so we can make these predictions to avoid any costly or unnecessary performance issues or SLA breaches.

So let's ensure we have a Brighter Outlook.  It is crucial that we get the information at all levels as described and store this information typically in a centralised database, so that we have it readily available.  Production of adhoc reports on infrastructure usage and current performance of our applications along with implementing automated reports specifically based on our SLA thresholds, enables us to produce early warning alerts on potential breaches and take appropriate action as necessary.  

Guest and Host consolidation.  If you have over provisioned systems, look at the usage of your virtual machines against the configured resources to see if there is scope to consolidate your guests onto smaller numbers of ESX hosts.   You may also be able to have multiple applications running within the same VM rather than have many VMs running a single application. 

Plan ahead.  By producing trend reports, producing analytical models and predicting what impact is likely to happen to your infrastructure and application performance running within it.  Then make the necessary recommendations on upgrades or configuration changes that prevent you encountering any SLA breaches and associated impacts on services.  All of this information should be included within a Service Capacity Plan, allowing us to make decisions on whether we need to upgrade, whether we need to standardise our hardware and what the associated costs are likely to be, so budgets can be accurately planned.

It can also help us decide on whether a more powerful and expensive server is actually required when maybe a less expensive, slightly less powerful server, will do just as good a job.   Creating analytical models gives you the information you require to make those decisions.

Regular consultation and information sharing with Application teams and other Service Delivery teams will assist you in making the best decisions going forward and allows you to explain why you have made the stated recommendations.

I'll leave you with a quick look at monetary savings on Capital Expenditure (CAPEX) and Operational Expenditure (OPEX)

CAPEX

All the way through this series I have mentioned being able to possibly reduce the numbers of servers required to host all of your virtual machines.  This enables you to possibly make savings on the number of licenses you require and on the actual hardware costs.  As you start to reduce your physical estate and consolidate ESX hosts, you start to look at the possibility of reducing the size of your datacenter.

OPEX 

Make savings by reducing the amount spent on maintenance and support as you reduce the numbers of servers required in your infrastructure hosting your applications and services.  By performing application sizing, we can assist in accurately provisioning resource requirements and help eliminate any potential overspend by over provisioning.  Further to this, we can actually reduce the physical server count leading to a reduction in the size of a datacenter. 

The savings from this approach such as Power Usage -  servers & cooling / lighting, reduction in emissions through reduced power consumption but also through  a reduction in components as we consolidate servers and finally through usage, by optimally sizing and provisioning.

In my series I‘ve covered what Cloud computing is and how it is underpinned by Virtualisation, the benefits it can provide, things we should be aware of and how putting in place effective Capacity Management can save you time and money. 

If you'd like further information on Capacity Management there are a selection of papers available to download http://www.metron-athene.com/_downloads/index.html and don't forget to register for my webinar 'Understanding VMware Capacity' http://www.metron-athene.com/services/training/webinars/index.html

Jamie Baker
Principal Consultant

No comments:

Post a Comment