Monday, 11 February 2013

Capacity Management: Guided Practitioner Satnav – Availability (10 of 17)


Availability also needs to be addressed in a practical fashion. Not ideal objectives of 99.99% or similar imposed across the board without reference to the significance of the service or the cost of achieving the level suggested.

Why provide a gold-plated service for low priority discretionary work?

 




The above shows two popular approaches. The first is a blanket 99.99 sort of approach which is a gross averaging mechanism. Consider a single 8 hour downtime event. This would still leave 99.9% achievable over a year, but is only 93.3% for the week it occurred.

It is better to use a definition that reflects the local situation. Typically outages of a given duration can be accepted within a given period. Say up to 6 minutes down is OK if it occurs just once a week and once up to 60 minutes in a month, once up to 4 hours in a quarter and once up to 8 hours in a year. This equates to 99.5% but is meaningful to users and can be monitored and policed effectively.

Once that is achieved, other issues arise. What is‘up’ and what is ‘slow time’?

Continuity and disaster recovery needs to be sized to ensure that a reasonable service will continue to be supplied within the DR defined levels.

There are many practical architectural issues to be addressed: 

Data security to reduce impact of DR:

Backups made to tape/disk on site and sent off-site regularly

Data replication to an off-site location so only system sync required

High availability systems to keep both the data & system replicated  

Precautionary measures:

Local mirrors of systems and/or data and use of RAID

Surge protectors, UPS and/or backup generator, fire prevention

Antivirus, antibot software and other security measures 

Stand-by site at:

– Own site with high availability

Own remote facilities with SAN

An outsourced disaster recovery provider 

DR service

– Priority of service determines if included DR service

– DR reduced performance and reduced traffic constraints as per SLA

As well as all these, the performance and capacity of the DR site needs to be properly sized. Given that effective capacity management is already in place for the production solution, extrapolation to assess the DR site is comparatively straightforward and may not require a ‘hot test’ but models can be used to justify configuration and cost of the Disaster Recovery site. 

This leads us nicely in to my next topic Demand Management and I’ll cover this on Wednesday ......... 

In the meantime there's a chance for one person to win a signed copy of my Capacity Management book (referred to in the first blog of this series)Simply subscribe to our blog or YouTube Channel,Like us on Facebook or follow us on Twitter or LinkedIn between 31st January and 15th March inclusive to be entered in to our drawing. 

Like, follow or subscribe to 3 media or more and receive an additional free entry.
Only one entry per person per media is valid and no cash alternative is available.
The winner will be notified and published after the drawing on 29th March 2013. 

Good Luck!  

Adam Grummitt

Distinguished Engineer

 


No comments:

Post a Comment