Thursday, 19 September 2013

Big Data and Capacity Management (1/3)

Big Data has received Big Attention from a lot of different people, and not only in the IT world.  But why?  Let’s examine what Big Data is first and then talk about why it’s important that Capacity Managers carefully consider the implications of Big Data.

What is Big Data?

Big Data typically refers to large sets of data that (choose one or many):

(1)  Require processing power that exceeds what you’d get with a legacy database

(2)  Are too big to be handled by one system in a traditional way (*)

(3)  Arrives much too quickly or in too great a volume to be handled in the traditional way

(4)  Are of such varied types that it makes little sense to try to store or handle them in a traditional way

(*) by “Traditional Way” I mean handled by a typical relational database.  Even there, the definition of “typical” is somewhat elusive due to the complexities of today’s databases.
Why are we concerned about doing things in a new way?

(1)  90% of all data stored today was created in the last 2 years
Frightening concept, isn’t it?  Imagine if this trend continues over the next decade.  Storage costs have dropped consistently – statistics I’ve seen reflect that storage amounts per dollar have halved just about annually over the last decade.   And yet, it doesn’t take an advanced math degree to see that companies will need to make considerable investments in storage space to keep up with the demands of Big Data.  Another estimate from IBM is that over 2.5 exabytes (an exabyte is a million terabytes or a billion gigabytes) of data is created every day.

OK, but what *is* Big Data?

Big data generally falls into 3 categories – the traditional enterprise data we’re all used to, automatically generated data (from call logs, smart meters, facility/manufacturing sensors, and GPS data, among others), and social data that can include valuable customer feedback from sites like Facebook and Twitter, among others.
Companies have realized in the past few years that these last two types – non-traditional data – can be mined for incredibly useful information.  However, this data is not well structured and is too voluminous to be handled in traditional ways.  In the past, the technology to mine this data either didn’t exist or was prohibitively expensive and didn’t provide a clear return on investment.  In those days, this data would’ve likely been thrown out or ignored and allowed to be deleted once it became a certain age, unlikely to receive a second (or even a first) look.

Not only is the data valuable, but much of it is perishable, meaning it’s crucial that the data be used almost immediately or the value of the data decreases dramatically or even ceases to exist.

What’s changed?
Companies are starting to analyze and look at this data more closely now.  Many things have changed to make this happen – I’ll list the top four, in my estimation:
(1)  Storage costs have decreased dramatically.  As I mentioned earlier, the cost of disk has halved annually for about the last decade.  Whereas GB and TB of storage was extremely expensive 5-10 years ago and closely controlled, the cost to store additional data (such as Big Data) has dropped to make it more affordable.
(2)  Processing / Computation costs have decreased dramatically.  Combined with new technologies that allow processing to be distributed across many (relatively) cheap nodes (see #3, below), it’s extremely easy to put processing power in place to manipulate and analyze these huge, varied sets of data.  And do it quickly, while the data still has value to the business.
(3)  New technologies have enabled the transformation of huge sets of data into valuable business information.  One of the leaders in the processing of Big Data is an open source technology called Apache Hadoop.  
(4)  Social Media outlets such as Facebook, Twitter, and LinkedIn that allow customers to quickly, easily, and sometimes semi-anonymously provide feedback directly to a company.  The company doesn’t necessarily control the location where the feedback is given, so it’s crucial that the company is able to quickly and easily mine for relevant data.  This is especially true when it comes to sites like Twitter, where data is especially perishable and trends can come and go very quickly.
Dale's webinar series on Big Data and the implications for Capacity Management is worth attending, if you missed part 1 then join our Community and listen to it now http://www.metron-athene.com/_downloads/on-demand-webinars/index.html and don't forget to sign up for Capacity Management for Big Data Part 2 http://www.metron-athene.com/services/training/webinars/index.html

Join me again on Monday when I'll be  taking a look at a technology for Big Data.
Rich Fronheiser
Chief Marketing Officer

No comments:

Post a Comment