Availability also needs to be addressed in a practical fashion. Not ideal objectives of 99.99% or similar imposed across the board without reference to the significance of the service or the cost of achieving the level suggested.
Why provide a gold-plated service for low priority discretionary work?
The above shows two popular approaches. The first is a blanket 99.99 sort of approach which is a gross averaging mechanism. Consider a single 8 hour downtime event. This would still leave 99.9% achievable over a year, but is only 93.3% for the week it occurred.
It is better to use a definition that reflects the local situation. Typically outages of a given duration can be accepted within a given period. Say up to 6 minutes down is OK if it occurs just once a week and once up to 60 minutes in a month, once up to 4 hours in a quarter and once up to 8 hours in a year. This equates to 99.5% but is meaningful to users and can be monitored and policed effectively.
Once that is achieved, other issues arise. What is‘up’ and what is ‘slow time’?
Continuity and disaster recovery needs to be sized to ensure that a reasonable service will continue to be supplied within the DR defined levels.
There are many practical architectural issues to be addressed:
• Data security to reduce impact of
DR:
– Backups made to tape/disk on site and sent off-site regularly
– Data replication to an off-site location so only system sync required
– High availability systems to keep both the data & system replicated
• Precautionary measures:
– Local mirrors of systems and/or data and use of RAID
– Surge protectors, UPS and/or backup generator, fire prevention
– Antivirus, antibot software and other security measures
• Stand-by site at:
– Own site with high availability
– Own remote facilities with SAN
– An outsourced disaster recovery provider
• DR service
– Priority of service determines if
included DR service
– DR reduced performance and reduced
traffic constraints as per SLA
As well as all these, the performance and capacity of the DR site needs to be properly sized. Given that effective capacity management is already in place for the production solution, extrapolation to assess the DR site is comparatively straightforward and may not require a ‘hot test’ but models can be used to justify configuration and cost of the Disaster Recovery site.
This leads us nicely in to my next topic Demand Management and I’ll cover this on Wednesday .........
In the meantime there's a chance for one person to win a signed copy of my Capacity Management book (referred to in the first blog of this series)Simply subscribe to our blog or YouTube Channel,Like us on Facebook or follow us on Twitter or LinkedIn between 31st January and 15th March inclusive to be entered in to our drawing.
Like, follow or subscribe to 3 media or more and receive an additional free entry.
Only one entry per person per media is valid and no cash alternative is available.
The winner will be notified and published after the drawing on 29th March 2013.
Good Luck!
Adam Grummitt
Distinguished Engineer
No comments:
Post a Comment