What key capacity metrics should you be monitoring for your Big Data environment? My list includes some key CPU, Memory, File System and I/O metrics which can give you a good understanding of how well your systems are performing and whether any potential capacity issues can be identified.
• Standard CPU metrics
o utilization, system/user breakdown
o Usage, Paging, Swapping
• User/Process breakdown – define workloads
• File System
o Number of Files
o Number of Blocks
o User breakdown
o Response time
o Service times
By capturing the user/process breakdown on your UNIX/Linux systems, we can start to define workloads and couple that with the predicted business usage to produce both baseline and predictive analytical models.
Some of the following key questions can then be answered:
• What is the business usage/growth forecast for next 3, 6, 12 months?
• Will our existing infrastructure be able to cope?
• If not what will be required?
• Are any storage devices likely to experience a capacity issue within the next 3,6,12 months?
• Are any servers or storage devices experiencing any performance issues and what is the likely root cause?This is not an exhaustive list, but it does provide information on the key capacity metrics you should be monitoring for your Big Data environment.
In my final blog I'll be looking at CPU breakdown and summarizing.