I’ll be running a webinar on September 12 looking at
the impact of Big Data from a capacity management perspective and so I thought
it would be good to share an overview of Big Data with you, starting today with
the terminology used and what it means.
Jargon related to Big Data is new to many people in IT
and the list below explains the more common terms you may see.
Hadoop
An open source software library project administered
by the Apache Software Foundation. Apache defines Hadoop as “a framework that
allows for the distributed processing of large data sets across clusters of
computers using a simple programming model.”
HDFS
Data in a Hadoop cluster is broken down into smaller pieces (called
blocks) and distributed throughout the cluster. In this way, the map and reduce
functions can be executed on smaller subsets of your larger data sets, and this
provides the scalability that is needed for big data processing.
Hbase
A distributed columnar NoSQL database e.g. the Hadoop
database
Hive
Hive is a data warehouse system for Hadoop that
facilitates easy data summarization, ad-hoc queries, and the analysis of large
datasets stored in Hadoop compatible file systems. Hive provides a mechanism to
project structure onto this data and query the data using a SQL-like language
called HiveQL. At the same time this language also allows traditional
map/reduce programmers to plug in their custom mappers and reducers when it is
inconvenient or inefficient to express this logic in HiveQL.
Map/Reduce
A general term that refers to the process of breaking
up a problem into pieces that are then distributed across multiple computers on
the same network or cluster, or across a grid of disparate and possibly
geographically separated systems (map), and then collecting all the results and
combines them into a report (reduce). Google’s branded framework to perform
this function is called MapReduce.
Mashup
The process of combining different datasets within a single
application to enhance output, for example, combining demographic data with
real estate listings.
On
Wednesday I’ll be looking at where Big Data exists, in the meantime don’t
forget to register for my free webinar http://www.metron-athene.com/services/training/webinars/index.html
Dale
Feiste
Principal Consultant
No comments:
Post a Comment