Now that we have a better idea of what Hadoop is and how
it’s used to manage Big Data, the most important thing to put in place is a
mechanism to monitor the performance of the cluster and the nodes within the
cluster.
Hadoop is supported on Linux and Windows (it can run over
other operating systems, such as BSD, Mac OS/X, and OpenSolaris). Existing utilities and performance counters
exist for those operating systems, which means that athene® can be implemented
to capture performance and capacity data from those nodes and that data can be
stored in the organization’s Capacity Management Information Store (CMIS).
The Capacity Manager, as in all other environments, is
interested in a number of things:
(1) How
is the cluster and how are the nodes within the cluster performing now? Performance data that shows how much CPU,
memory, network resources, and disk I/O are used are readily available and can
be stored and reported upon within athene®.
Web reporting and web dashboards can easily show the Capacity Manager
the health of the cluster nodes and automatic alerting can quickly point the
Capacity Manager to exceptions that can be the source of performance problems
in the environment.
(2) What
are the trends for the important metrics? Big data environments typically are
dealing with datasets that are increasing in size – as those data sets
increase, the amount of processing of that data tends to increase, as well. The Capacity Manager must keep a close eye
out – a healthy cluster today could be one with severe performance bottlenecks
tomorrow. Trend alerting is built into
athene® and can alert the Capacity Manager that performance thresholds will be
hit in the future, allowing ample time to plan changes to the environment to
handle the increased load predicted in the future.
(3) Storage
space is certainly something that cannot be forgotten. With DAS, data is distributed and replicated
across many nodes. It’s important to be
able to take this storage space available in a Hadoop cluster and represent it
in a way that quickly shows how much headroom is available at a given time and
how the amount of disk space used trends over time. athene® can easily aggregate the directly
attached storage disks to give a big-picture view of disk space available as
well as the amount of headroom. These
reports can show how disk space is used over time, as well. Trend reports and alerting can quickly alert
the Capacity Manager when free storage is running low.
(4) Finally,
the ability to size and predict necessary changes to the environment as time
goes on. As with any other environment,
a shortage in one key subsystem can affect the entire environment. The ability to model future system requirements
based on business needs and other organic growth is vital for the Capacity
Manager. With athene®, it’s easy to see
how trends are affecting future needs and it’s equally easy to model expected
requirements based on predicted changes to the workload.
As the price of data storage continues to decrease
and the amount of data continues to increase, it becomes even more vital that
organizations with Big Data implementations closely manage and monitor their
environments to ensure that service levels are met and adequate capacity is
always available.
We'll be taking a look at some of the typical metrics you should be monitoring in our Capacity Management and Big Data webinar(Part 2) tomorrow -register now and don't worry if you missed Part 1, join our Community and catch it on-demand
Rich Fronheiser
Chief Marketing Officer
No comments:
Post a Comment