Big Data is more than Hadoop and Analytics

How do you make a big issue small?  Redefine the term!

The definition of Big Data is rapidly being morphed into a subset of what it really ought to be.  Over the last 3-6 months, the prevailing sentiment, if you focus only on the buzz meter, is that the Big Data discipline is really a simple matter of adopting Hadoop and using it to create some business analytics app—most likely in sales and marketing— to do sentiment analysis.  This all seems a bit BI-centric to us.

Hadoop Analytics apps are nice to be sure, but only about 1/100th of what Big Data really is! 

Big Data is not a quick and dirty movement of data to analyze for a point in time application. It’s a pervasive, systemic problem domain that creates challenges the magnitude of which we’ve never before encountered in IT.  That’s because data is really now a strategic asset of every business, just as valuable as your products, customers and even the cash in the bank.  Data defines your value as a business and a service to your market.  To not look at it as the fundamental issue to focus on and manage is as short-sighted as you can get.

This perspective is something that will no doubt change over time.  Like many new areas, the initial burst of jumping to the end game will ultimately be replaced by a management process that addresses the problem at its core.  In this case, this is the idea that Big Data is really about all of the massive amounts of data within our corporate domains.

We’ve been here before … take eDiscovery as a very recent example.

We really don’t need to look too far back in to history to see how this will play out.  In 2006, the Federal Rules of Civil Procedure were published saying that all electronic data was now included as a source in discovery, thus the eDiscovery market began to take shape.  At that time, law firms and corporate counsel were all well versed in the discipline of reviewing documents using automated review tools.  They just never had to worry about searching for all of the data that was relevant.

The first generation of eDiscovery tools took off, offering review capabilities coupled with bulk load file ingestions that allowed packets of data to be moved in to these systems.  They largely got there manually and a good 95% of it was irrelevant but the market was ‘white hot’.

Or at least until the bills started to come in on bigger cases, and then after that the fines—often in the tens of millions of dollars!

Before long, corporations began asking how to reduce the cost and risk of eDiscovery, and they found to their surprise that the root cause was poor information management.  Without any ability to understand, analyze and manage distributed data, legal groups were simply replicating everything and paying service providers, law firms and their own people to sort it all out.

Experts finally weighed in in 2009 and the EDRM (Electronic Discovery Reference Model) standard was formed, providing everyone with the road map to success.  Not surprisingly, Information Management was at the core.  Today, no eDiscovery solution or practice comes without this foundation.

From EDRM to BDRM

The same thing will happen in Big Data.  In fact, we believe that it is only a matter of time for a model like the EDRM process turns up for Big Data … a BDRM (Big Data Reference Model).

The process is identical.  To effectively provide any kind of application on top of Big Data, the underpinnings of information management, analysis and targeted collection are a must.  There are also three prerequisites as well: it has to be fast, precise and adapt automatically to change.  Without these capabilities, the output is useless, based on, at best, latent information and, at worst, outdated or useless information.

We need to treat Big Data for what it is … BIG DATA.  The issue here isn’t the need to reduce the scope of the problem but to turn up the precision for how you deal it with it.  By understanding it, analyzing it, managing it and then feeding an application.  The process never varies but sometimes our perspectives get a little short-sighted.

Comments

Powered by Facebook Comments

Leave a Reply


7 - = 6

  1. Craig W

    That is what I’m talking about. Well said.