Monday 9 September 2013

Big Data Overview (1 of 4)

I’ll be running a webinar on September 12 looking at the impact of Big Data from a capacity management perspective and so I thought it would be good to share an overview of Big Data with you, starting today with the terminology used and what it means.

Jargon related to Big Data is new to many people in IT and the list below explains the more common terms you may see.

 
Hadoop
An open source software library project administered by the Apache Software Foundation. Apache defines Hadoop as “a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.”
HDFS
Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing.
Hbase
A distributed columnar NoSQL database e.g. the Hadoop database
Hive
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
Map/Reduce
A general term that refers to the process of breaking up a problem into pieces that are then distributed across multiple computers on the same network or cluster, or across a grid of disparate and possibly geographically separated systems (map), and then collecting all the results and combines them into a report (reduce). Google’s branded framework to perform this function is called MapReduce.
Mashup
The process of combining different datasets within a single application to enhance output, for example, combining demographic data with real estate listings.

 

On Wednesday I’ll be looking at where Big Data exists, in the meantime don’t forget to register for my free webinar http://www.metron-athene.com/services/training/webinars/index.html
Dale Feiste
Principal Consultant



 


No comments:

Post a Comment