So what is Big Data?
Big Data - Data sets whose size grows
beyond the management capabilities of traditional software that has been used
in the past.
Vast amounts of information - are now stored
specifically when dealing with social media applications, such as Facebook and
Twitter. Therefore the Big Data solution
needs to provide support for Hexa- and Peta-bytes of data.
Hadoop (HDFS)
- Once
such solution is Apache’s Hadoop offering. Hadoop is an open source software
library project administered by the Apache Software Foundation. Apache defines Hadoop as “a framework that
allows for the distributed processing of large data sets across clusters of
computers using a simple programming model.”
Using HDFS, data in a Hadoop cluster is broken down into smaller pieces
(called blocks) and distributed throughout the cluster by auto-replication. In
this way, the map and reduce functions can be executed on smaller subsets of
your larger data sets, and this provides the scalability that is needed for big
data processing.
Map/Reduce- A general term that refers to the
process of breaking up a problem into pieces that are then distributed across
multiple computers on the same network or cluster, or across a grid of
disparate and possibly geographically separated systems (map), and then collecting
all the results and combining them into a report (reduce). Google’s branded
framework to perform this function is called MapReduce.
Not
recommended for SAN/NAS - Because the data is distributed across multiple
computers whether they are on the same network or disparately separated, it is
not recommended to use SAN or NAS storage but local system block storage.So what should we be monitoring? I'll deal with this next....Happy New Year to you all.
Jamie Baker
Principal Consultant
No comments:
Post a Comment