Big Data has received Big Attention from a lot of different
people, and not only in the IT world.
But why? Let’s examine what Big
Data is first and then talk about why it’s important that Capacity Managers
carefully consider the implications of Big Data.
What is Big Data?
Big Data typically refers to large sets of data that (choose
one or many):
(1) Require
processing power that exceeds what you’d get with a legacy database
(2) Are
too big to be handled by one system in a traditional way (*)
(3) Arrives
much too quickly or in too great a volume to be handled in the traditional way
(4) Are
of such varied types that it makes little sense to try to store or handle them
in a traditional way
(*) by “Traditional Way” I mean
handled by a typical relational database.
Even there, the definition of “typical” is somewhat elusive due to the
complexities of today’s databases.
Why are we concerned about doing things in a new way?
(1) 90%
of all data stored today was created in the last 2 years
Frightening concept, isn’t it? Imagine if this trend continues over the next
decade. Storage costs have dropped
consistently – statistics I’ve seen reflect that storage amounts per dollar
have halved just about annually over the last decade. And yet, it doesn’t take an advanced math
degree to see that companies will need to make considerable investments in
storage space to keep up with the demands of Big Data. Another estimate from IBM is that over 2.5
exabytes (an exabyte is a million terabytes or a billion gigabytes) of data is
created every day.
OK, but what *is* Big
Data?
Big data generally falls into 3 categories – the traditional
enterprise data we’re all used to, automatically generated data (from call
logs, smart meters, facility/manufacturing sensors, and GPS data, among
others), and social data that can include valuable customer feedback from sites
like Facebook and Twitter, among others.
Companies have realized in the past few years that these
last two types – non-traditional data – can be mined for incredibly useful
information. However, this data is not
well structured and is too voluminous to be handled in traditional ways. In the past, the technology to mine this data
either didn’t exist or was prohibitively expensive and didn’t provide a clear
return on investment. In those days,
this data would’ve likely been thrown out or ignored and allowed to be deleted
once it became a certain age, unlikely to receive a second (or even a first)
look.
Not only is the data valuable, but much of it is perishable,
meaning it’s crucial that the data be used almost immediately or the value of
the data decreases dramatically or even ceases to exist.
What’s changed?
Companies are starting to analyze and look at this data more
closely now. Many things have changed to
make this happen – I’ll list the top four, in my estimation:
(1) Storage
costs have decreased dramatically. As I
mentioned earlier, the cost of disk has halved annually for about the last
decade. Whereas GB and TB of storage was
extremely expensive 5-10 years ago and closely controlled, the cost to store
additional data (such as Big Data) has dropped to make it more affordable.
(2) Processing
/ Computation costs have decreased dramatically. Combined with new technologies that allow
processing to be distributed across many (relatively) cheap nodes (see #3,
below), it’s extremely easy to put processing power in place to manipulate and
analyze these huge, varied sets of data.
And do it quickly, while the data still has value to the business.
(3) New
technologies have enabled the transformation of huge sets of data into valuable
business information. One of the leaders
in the processing of Big Data is an open source technology called Apache
Hadoop.
(4) Social
Media outlets such as Facebook, Twitter, and LinkedIn that allow customers to
quickly, easily, and sometimes semi-anonymously provide feedback directly to a
company. The company doesn’t necessarily
control the location where the feedback is given, so it’s crucial that the
company is able to quickly and easily mine for relevant data. This is especially true when it comes to
sites like Twitter, where data is especially perishable and trends can come and
go very quickly.
Join me again on Monday when I'll be taking a look at a technology for Big Data.
Rich Fronheiser
Chief Marketing Officer