Archive for the ‘Byte of the Week’ Category
About once every five years or so, the technology industry blesses entrepreneurs with a new path to riches. The PC, Internet, Mobile and Social paths all emerged in the last 20 years with new computing paradigms and business models that optimized value for their business, partners and customers. Today’s catalyst is in Big Data.
If you are looking at the ‘way’ to make money in tech in the next decade, jump on the Big Data bandwagon. The dynamics of traditional high growth tech segments are all there … from the proliferation of a big, recognizable problem, to management challenges and to business issues, Big Data affects the entire organization and IT infrastructure. And, it comes at a good time for tech businesses.
“With hardware, networks and software having been commoditized to the point that all are essentially free, it was inevitable that the trajectory toward maximum entropy would bring us the current age of Big Data.”
-Shomit Ghose, Big Data, The Only Business Model Tech has Left; CIO Network.
The key to making money (or optimizing value if you are a buyer) in this emerging economy is to build a foundation around a ‘return on data’ model. Your data is integrally related to the many aspects of how you create value in business. The key is how quickly data can be turned in to currency by:
- Analyzing patterns and spotting relationships/trends that enable decisions to be made faster with more precision and confidence.
- Identifying actions and bits of information that are out of compliance with company policies can avoid millions in fines.
- Proactively reducing the amount of data you pay ($18,750/gigabyte to review in eDiscovery) by identifying only the relevant pieces of information.
- Optimizing storage by deleting or offloading non-critical assets to cheaper cloud storage thus saving millions in archive solutions.
We are now living in a data-driven world where a mastery of technologies and processes that enable a rapid ROD (Return on Data) is the key to reducing cost, complexity, risk and increasing the value of your holdings.
Big Data means Big Business for vendors, service providers, enterprise buyers and consumers. Every stakeholder has a piece of maximizing value out of the Big Data pie. Those who drive their businesses and economic engines around a philosophy of creating a high return-on-data will be the winners in this next version of the ‘new economy’.
The definition of Big Data is rapidly being morphed into a subset of what it really ought to be. Over the last 3-6 months, the prevailing sentiment, if you focus only on the buzz meter, is that the Big Data discipline is really a simple matter of adopting Hadoop and using it to create some business analytics app—most likely in sales and marketing— to do sentiment analysis. This all seems a bit BI-centric to us.
Hadoop Analytics apps are nice to be sure, but only about 1/100th of what Big Data really is!
Big Data is not a quick and dirty movement of data to analyze for a point in time application. It’s a pervasive, systemic problem domain that creates challenges the magnitude of which we’ve never before encountered in IT. That’s because data is really now a strategic asset of every business, just as valuable as your products, customers and even the cash in the bank. Data defines your value as a business and a service to your market. To not look at it as the fundamental issue to focus on and manage is as short-sighted as you can get.
This perspective is something that will no doubt change over time. Like many new areas, the initial burst of jumping to the end game will ultimately be replaced by a management process that addresses the problem at its core. In this case, this is the idea that Big Data is really about all of the massive amounts of data within our corporate domains.
We’ve been here before … take eDiscovery as a very recent example.
We really don’t need to look too far back in to history to see how this will play out. In 2006, the Federal Rules of Civil Procedure were published saying that all electronic data was now included as a source in discovery, thus the eDiscovery market began to take shape. At that time, law firms and corporate counsel were all well versed in the discipline of reviewing documents using automated review tools. They just never had to worry about searching for all of the data that was relevant.
The first generation of eDiscovery tools took off, offering review capabilities coupled with bulk load file ingestions that allowed packets of data to be moved in to these systems. They largely got there manually and a good 95% of it was irrelevant but the market was ‘white hot’.
Or at least until the bills started to come in on bigger cases, and then after that the fines—often in the tens of millions of dollars!
Seems everyone is jumping on the Big Data bandwagon these days. There are new announcements almost daily, and by now nearly every vendor or service provider who does something with data is now a ‘Big Data’ company. It’s certainly a great place to be as without a doubt, every potential customer on the planet is sitting down to evaluate what their Big Data strategies need to be for the next 3-5 years.
Let’s see, there are Big Data analytics apps that now all the BI vendors offer. There is Oracle’s new appliance that solves all your Big Data problems by providing high-speed pipes into their database. HP is doing the same. EMC is marrying BI with Storage. Hadoop is becoming the ‘in’ thing to say you are adopting to build applications. And SI’s are developing new practices to lead you through the transformation. Is this all real? In point of fact, they most certainly are not because right now…
Everyone is ignoring the elephant in the room… Where do you get started?
No one is telling enterprise leaders the reality of Big Data because it’s not good news. That reality is that most likely, you are nowhere near ready to implement these solutions because every one of them requires an ability to actually discover, analyze and act on petabytes of data living somewhere in the enterprise. And right now, less than 2% of the companies we’ve identified actually have anything that does this work.
How about maybe 15 years from now!
It seems everyone is jumping on the Big Data bandwagon. Analysts, CIO’s, systems integrators, vendors and most certainly the press. It’s easy to see why. Not only is it a good story that is easy to digest with a spectrum of outcomes that goes from exceedingly dark if you look at the costs and IT issues, to irrationally exuberant if you look at the business value. Regardless, almost no one is questioning its importance and stature as the next big thing in IT.
Almost no one, except for Rich Castagna, with Storage Media Group. Rich wrote a great article last week entitled Big Data Conspiracy Theories Abound. The central premise it seems is whether or not:
a) there truly is as big a Big data problem at most companies as we are hearing, and,
b) if there is, are the storage vendors going to squash it by buying up companies like StoredIQ and Kazeon just to eliminate the products that can find and delete unwanted data because it’s not in their best interests to have customers do that! Unless of course you use these products to load up their proprietary storage repositories and pay them more fees for hardware and software.
Though in general we disagree with Rich’s view that Big Data’s life expectancy will be around about as long as the pet rock, he does makes a great point. To think that the Big Data problem is going to be solved by companies who want to use a land grab to create bigger storage farms is completely missing the point. The point of Big Data management strategies is not to have costs go up and increase reliance on IT storage solutions by creating lots of projects to move data around, but rather to deal with the real issue of managing data ‘in-place’ and turning it into value for the business.
As mandated by the US Constitution (Section 2), every 10 years the US government goes into a complete frenzy as the Census Bureau kicks off the task of capturing details about every single resident of the United States. Over a million workers were involved in collecting and compiling together the data from the 2010 Census – this despite an all-time high in mail-in responses. The Census Bureau had an estimated $15 billion budget for this latest installment of the US’s oldest exercise in Big Data.
Many important decisions are often based on Census reports, including the number of representatives allowed for each state, entitlement payments, and so on. As you might imagine, the accuracy of the data is very important. Each time, the Census therefore turns into the single largest organized project the country has ever seen, involving planning and resource allocation on an overwhelming scale.
The CIO’s Census
If you’re a CIO at a large information-centric organization, you might already be feeling those déjà-vu goose-bumps. You realize all too well that you’re trying to manage what you cannot clearly see – an enormous mountain of data. Data that comes in the form of email, spreadsheets, web pages, tweets, posts. Data that comes in different sizes. Data that has different ages and lifecycles associated with it. Data that is located on various servers in various labs, sometimes spanning continents. Data that can have different levels of value to your organization depending on the information it carries. You get the picture – the data in your organization is as diverse, as massive, as dispersed as the nation’s populace, if not more. The Census is exactly the kind of grand information round-up and mining that most organizations would like to be able to afford, but simply cannot budget for – whether it’s time or money or people. And in the information age, where data is the new money, can you really afford to ignore a Census for your organization’s data?
Centralized vs. Decentralized. User Productivity vs. Secured Control. Open vs. Closed. Insource vs. Outsource. Virtual vs. Inhouse. We run in patterns and circles. Loved the discussion that Steve Jobs and Bill Gates had in their reflective moments at the end of Steve’s life about the two different philosophies they had on building products and companies and how both actually were right in their time.
Big Data is the same way. We’ve seen this before and we know how the picture ends.
In the early 90’s there was a widespread recognition that the explosion of business assets being created outside of corporate Data Centers had exceeded the capacity of IT to manage it. Quite simply, there were just too many desktops, local area networks, applications and business data being used by departments that traditional Network and Systems Management environments were overwhelmed. IT tried controlling things through security, provisioning standards and written policies, but the business just blew right by them and created ‘millions of moving parts’ for IT to manage.
Several years ago at a VC roundtable about “big data” here in Austin, TX, I got into a very vibrant discussion with a group of CEOs around the concept of the value of Big Data. My point was simple. “If capitalism in the 19th and 20th Centuries was about how to use and leverage aggregated money, then capitalism in the 21st Century will be about how to use and leverage aggregated data.”
Of course, I then downgraded my own prediction when I observed, somewhat cynically, that while many companies claim that their data is their most important asset, no one had yet figured out how to list it on their balance sheet in a way an auditor would recognize or accept (which is a discussion for a different day).
I’m encouraged to find that this reality is changing … slowly.
A number of recent news items caught my attention:
- In the July 2nd 2001 issue of the New York Times, there’s a wonderful article about the World Bank and how the Bank is using its “Treasure Chest of Data.” In the article, Robert B. Zoellick, president of the World Bank and a career diplomat/member of the Republican foreign-policy elite made a shocking statement: “The most valuable currency of the World Bank isn’t its money — it is its information.”
- In his keynote at the 2010 Gartner Symposium, Head of Global Research Peter Sondergard said: “Information will be the Oil of the 21st Century. It will be the resource running our economy in ways not possible in the past.”
Is this whole notion of Big Data as an equivalent value to things we know and value easily today actually taking root? The best way to answer this question is to spend just a few minutes learning how people learned to create and manage commodities like money and oil, then seeing what parallels we can draw.
On virtually any topic, I can fire up my browser, click on Google, type in a query and find useful information that resides on websites around the world. Why can’t we do the same inside our enterprises?
Search is a black hole inside organizations. With the growing amount of data that corporations now have access to through repositories, archives, files, e-mail, desktops, mobile devices, cloud-based services, social media and more, nearly every company has Big Data. Petabytes is the norm and we’re now learning new words like exabytes and zetabytes. How can we make all of this data useful to running our businesses?
The dirty little secret right now is that we’re actually more concerned with how to hide it than mine it.
That’s because nearly all attempts to provide solutions for big data have taken the wrong approach to solving the problem. That is: closed platforms that try to move, store, temporarily copy or even create new repositories for all of your information that you will have ONE place to look.
The market is littered with solutions that are vendor-centric right now. The vendor thinking is that if they can get all of your data in one place, they will own control and access to your data will be done on their terms. We see examples of this kind of philosophy from big platforms like Autonomy, new solutions like Splunk and MarkLogic; even point application products for hot markets like eDiscovery (ala Clearwell). All of these environments provide search capabilities but only AFTER you’ve either reduced it to a workable subset or moved into a repository they can control. Their philosophy?
He who owns the data wins!
The problem with this approach is two-fold:
- It doesn’t work. There is simply no way to move petabytes of data under one roof so something is ALWAYS missing in the solution. Today, that missing number is an astounding 80%+ of unstructured data. Imagine Google providing answers with only 20% of the sites referenced.
- It’s a bad strategy for IT. Getting locked in means you’ve lost control of your data and made a very big bet on one vendor. For such a vital corporate asset, that’s a pretty big bet. Imagine only relying on Yahoo for access to information across the Internet.
To truly deliver a Google-like solution for the Enterprise, solution providers need to take an approach that is more open, fast to activate and based on a real-time (or near real-time) analysis of all of your data sources. It’s a very hard problem to be solved as it requires integration to information sources that are non-standard—you are literally dealing with ‘millions of moving parts’ in the equation (see Tom Bishop’s post on this). The web is actually simpler as the standards have evolved to make the sources more standardized for searching.
The key aspect of delivering an enterprise-ready solution that works like Google is to leave the data alone where it lives and use the search solution to tap into it and discover relevance. We are almost at the tipping point of having mature enough technologies addressing this problem to see a Google emerge. Focus on the approach and architecture to find yours.