Posts Tagged ‘Big Data’
It’s Star Wars All Over Again!
We’ve had some pretty BIG ideas launched in the last couple of decades that you just knew from the beginning could not possibly work in our lifetime. A couple of my favorites are the Star Wars Missile Defense System (it worked okay as a negotiating tool, but as an emerging technology?…not so much) and the Iridium Phone (the phone itself cost thousands of dollars, it had limited range because it depended on launching and locating one of 66 satellites, and you could only use it outdoors).
My favorite Star Wars project, though, is one called Project Stretch. A very long time ago (but don’t worry, this was still in our galaxy) Project Stretch was launched to improve the quality of weather prediction. The goal was to deploy thousands of weather monitoring stations around the country (this was before satellites of course), and then network them together to bring the data back to a single place, crunch all the numbers, and then predict the weather.
There was only one problem.
With the amount of data that needed to be processed, and the power of computers at the time, it was going to take three days of computer time to collect, analyze and then predict the weather for the next day.
It didn’t take long before someone made the fairly obvious suggestion that they could cut two days off the time required to predict the weather – wait 24 hours, walk outside, and look up!
Today’s Star Wars Project: Can I manage Big Data in the Cloud?
I felt an odd sense of déjà vu this week reading a Wired article on Big Data that focused on Google’s “breakthrough” with Dremel … “Google’s Dremel Makes Big Data Look Small. According to the article, the breakthrough that Dremel offers is that it can handle web-sized amounts of data at blazing fast speed, processing petabytes of information in 3 seconds.
Pretty impressive, huh? Even more interestingly, Wired goes on to say, you can use Dremel today — even if you’re not a Google engineer. Google now offers a Dremel web service it calls BigQuery. You can use the platform via an online API, or application programming interface. Basically, all you have to do is upload your data to Google, and it lets you run queries on its internal infrastructure.
And here is where Star Wars enters the equation. Like most readers, I’m sure, I was intrigued, but then I started to think… “Wait a minute. Did I just read what I thought I read?”
Did anyone ask how long it would take it to upload a petabyte of data to the cloud?
Let Us Pay
Ah, the bloom of Big Data is beginning to come off the rose isn’t it?
Part of the challenge for IT professionals today is to make ‘business sense’ out of Big Data. The interesting part of the conversations we have been involved in is that organizations are now beginning to surface and align Big Data around three very different sets of objectives. CIOs are now faced with an almost ‘Solomon-like’ challenge of how to view the perspective of ownership of Big Data.
Is it cost, risk or value?
Is Big Data just something that costs us a lot of money? Well, yes. We know that by the amount of it we have and how much we pay in storage and retrieval costs. We know that by the many cloud initiatives we have to offload expensive data storage to cheaper sources. In fact that cost is estimated at between $2k-$6k per TB per year.
Is Big Data a risk? Well, yes. There is information being stored that is old, no longer of value and undoubtedly proves a compliance violation, and that exposes a legal risk or maybe even a competitive exposure. The average amount of data that is of no current value is estimated at 69% of the total of each enterprise. You can rest assured that somewhere in that vast unwashed information set is a problem you’d rather not be around when the lawyers or regulators start digging in to your data.
Does Big Data have value hidden it? Well, yes. In fact, most analysts would tell you that your data is now a major ‘asset’ of the business. Robert Zoellnick, President of the World Bank recently stated that their data is more valuable to the bank than the money kept there. Imagine all the nuggets of gold contained in employee, customer and partner communications that define a blueprint for running a better business.
So, how do CIOs resolve these three very different forces? To some, it may seem that the presentation Dilbert’s boss makes is not that far off. Since we don’t really know the difference between what data should be kept or discarded, the natural fall back position is to keep it all! And, therefore make our companies subservient to the Big Data storage vendors who house it all behind gates that only they control.
Paying is one strategy of course (and one which is a continual drain on the balance sheet each time a new request to provide data emerges…from lawyers, compliance or the business). But, there are also now better options available:
To understand, classify, manage and govern your data like an asset.
Leading-edge companies are right sizing their repositories and building corporate management policies that appropriately ‘balance’ cost, risk and value. With the continued pressures of a tough economy, their proactive approaches are having significant bottom-line impacts.
The Bottom Line
Getting out in front of the Big Data tsunami sure beats standing still and being overwhelmed by it. Balance your approaches to maximize the value and you’ll likely see much lower costs, less risk and higher business value.
The Needle in the Haystack
Mary Meeker of Kleiner Perkins is getting lots of airtime for a presentation she gave recently that charts the amazing advances technology has made. She captured more than 50 ‘Re-imagination of Applications’ that depict relatively recent history of all types of technological evolutions in simple ‘then and now’ visuals. The full presentation can be accessed anywhere on the web. Here is a link through BusinessWeek: http://www.businessinsider.com/mary-meekers-latest-incredibly-insightful-presentation-about-the-state-of-the-web-2012-5#-1
I couldn’t help but be struck by her take on Big Data.
The picture perfectly depicts the paradigm shift we are going through in Information Management. The old days of viewing the problem through a storage lens are over. In the new world, it’s not as much about organizing the landfills, it’s a whole lot more about mining value from the haystack. The companies that succeed will be the ones who build their IT infrastructures around intelligence about data. Being smart enough to know where relevant subsets are located and active enough to be able to do something about it:
- Pulling the needle from the haystack to provide a business process with a new piece of information.
- Getting rid of needles that have no value.
- Enforcing policies that secure and cleanse the needles that have sustainable value to the business.
- Identify related sets of needles to manage them as a group.
Not enough time is being spent these days framing the data management problem for what it really is. Just like the datacenter had to give way to distributed computing when PC’s were introduced, so too will the idea that Storage Environments and Mega- Repositories will give way to intelligent indexing system that identify relevance in the data where it lives.
The needles are certainly in the haystack. It’s time to start finding them.
Simplicity is the Ultimate Sophistication
The words are Leonardo Da Vinci’s. He first uttered them a very long time ago, but these words should be the mantra for every CIO. Any company’s first step to managing Big Data must be to focus relentlessly on reducing complexity rather than increasing it. That doesn’t sound too profound, except that most IT organizations respond to requests from the business by doing exactly the opposite: by doing something that increases complexity. (If this was a Star Wars movie, this would be known as taking your first step to the Dark Side.)
In previous blog posts we talked about the Big Data version of the Alignment Trap (“Another Alignment Trap?”) and how IT organizations pursuing information management projects can avoid it (“Avoiding Big Data’s Alignment Trap – Part 1”). This post moves beyond perspective to the implementation of the solution and focuses on the elegance and power of simplicity.
Embracing simplicity, it turns out, is actually a hard thing to do. It means:
- Replacing legacy systems where possible.
- Eliminating add-ons.
- Driving consistency and standardization wherever possible.
- Building new solutions on simplified, standardized infrastructure rather than extensive customization or more layering on top of whatever happens to be there.
In the Sloan study, all of the companies that were caught in the Alignment Trap had made this same mistake… they repeatedly took steps that created an enormously complex IT environment. And it was only when they realized what they had done, and had taken steps to unravel the complexity they had created, were they able to finally extract themselves from the Alignment Trap and move towards a more efficient and effective IT environment.
We’ve learned this lesson in networking, in systems management, in back office applications, in CRM tools. It’s the same lesson, and it’s now Big Data’s turn.
Avoiding Big Data’s Alignment Trap – Part 1
In a previous blog post “Another Alignment Trap?” we talked about the Big Data version of the Alignment Trap (first documented in an MIT Sloan School study). We generated the Big Data version of the diagram presented in the Sloan study, and described Big Data variants of the four quadrants of that diagram:
Status Quo- Little to no value to the business, and the storage and management of that data is expensive. Further, the data is risky to hold, keeping too much of it, and it’s not secure.
Well-oiled Data- Little to no value to the business, but at least it’s well-managed. The data is stored on cheap storage, IT is good at implementing a document retention policy, getting rid of old data, and the rest is well-secured.
The Data Trap- The data is very important to the business, but it’s not well-managed. The data is used in many different areas of the business, but the data is duplicated, moved, not secured, stored and managed in lots of different places (so it costs a lot to manage)
Data-Driven Business Growth- The data is very important to the business, and it’s well applied to critical business problems and opportunities. The data is stored on storage whose cost is appropriate relative to its value and use, old data is deleted, and active, high-value data is retained, and the data that’s kept is well-secured.
To get to the ultimate goal of Data-Driven Business Growth, we agreed it’s important not only to optimize the business value of data, but also to have an IT organization effective at supporting the infrastructure required by these business-oriented information use cases.
What does an ineffective IT organization look like?
Another Alignment Trap?
Back in 2007, MIT’s Sloan School decided to ask what seems like a basic and reasonable question about IT: Do those organizations that spend more on IT have better business results to show for it?
Of course, the devil is entirely in the details, but they decided to make it simple. To make “spend more on IT” objective and measurable, they used IT budget as a percentage of annual sales. To make “better business results” objective and measurable, they used compound annual growth rate of sales over three years. Then, for the 500-odd companies they surveyed, they plotted the results on a chart.
What they expected to see (and what you would expect to see) was a high degree of correlation: the companies that spend more on IT (compared to the average) had better business results (compared to THAT average), and those that spent less had worse business results. But no. Instead, the results were all over the map. They then wondered if there were other factors that were missing that needed to be considered.
After much additional head-scratching, they concluded that the two other factors were: how well IT was aligned with the business (or not), and how efficacious (how efficient) IT was. It was only then that the pattern (shown below) emerged.
The Sloan Study concluded that companies that had efficient and aligned IT organizations did in fact outperform their peers, both in spectacular revenue growth while underspending on IT. This quadrant they called “IT-Enabled Growth.” They also found that ¾ of those surveyed had inefficient and unaligned IT organizations, and had the mediocre business results to show for it – “Maintenance Zone.” These are the companies Nicholas Carr was referring to when he wrote “IT Doesn’t Matter,” because to these organizations, it doesn’t.
Above and below this correlation, they found two more results that tell an even more interesting story. The “Well-oiled IT” organizations, by being less aligned with the business, didn’t have the business results the “Growth” companies did, but spent 15% less on IT than the average because their IT organization was effective at what it did.
The most interesting quadrant is the “Alignment Trap,” that spent the most on IT relative to the average, but had the worst business results to show for it.
They concluded that the problem was attempting to align IT to the business before the IT organization had its own house in order, thus the businesses failed precisely because they tied the business to an ineffective IT organization. They felt this result was so important that they titled their report “How to Avoid the Alignment Trap.”
Enter Big Data
So, what does Big Data have to do with all this? Turns out, exactly the same thing.
The Blind Men and the Elephant
Many are familiar with the old story about the Blind Men and the Elephant. In various versions of the tale, a group of blind men (or men in the dark) touch an elephant to learn what it is like. Each one feels a different part, but only one part, such as the side or the tusk. They then compare notes.
They conclude that the elephant is like a wall, snake, spear, tree, fan or rope, depending upon where they touch. They have a heated debate that does not come to physical violence, but they learn they are in complete disagreement, and the conflict is never resolved.
There’s an even more recent version of this story, but it involves IT Service Management. This story ends happily because the six men decide to rely on a Configuration Management Database, or CMDB.
Turns out the IT industry had a very similar problem as the six guys above. They each were responsible for different parts of an organization’s elephant, er, IT environment. One of their biggest problems was that each guy, using different tools, or having a different focus, or being responsible for different parts of the process, would end up with different and inconsistent views of what the IT environment really looked like.
So the guys who invented ITIL figured out (correctly, I might add) that the only way out of the problem was to include something called the CMDB. Without getting too technical, a true CMDB is a representation of a set of current and historical relationships between configuration items (the “atoms” of an IT environment). And as long as each of the guys keeps the CMDB up-to-date, nobody ends up being confused.
Splunk IPO a Milestone for Big Data
4/19/12: Splunk IPO marks first IPO of a Big Data vendor, doubles in value from its initial offering price and closes the first day of trading with a valuation in excess of $3b.
The Splunk IPO is very good news for Big Data vendors and the Big Data industry as a whole. Nearly every major new computing era in the past has had a hot IPO provide a catalyst for more widespread adoption of the shift. The reasons why may vary slightly but at the core, it is acceptance that the trend is real and there is big money to be made in the space.
As an analogy, Splunk reminds us very much of Netscape, the company that provide the catalyst in 1995 to a wave of Internet computing for both B2C and B2B marketplaces. Interesting parallel too in that its day one closing valuation jumped to a then unheard of $3B valuation. It ushered in a wave of new innovation in the space and a plethora of new .com businesses. The fact that Netscape ultimately lost a browser war with Microsoft and faded into oblivion and that .com created .bomb in the stock market in 2001 doesn’t negate the trend. 100’s of billions of dollars in new value was created, business environments changed forever and new forces emerged to provide a platform for growth … like Google, which incidentally fetched $23b in an IPO in 2004 and is $37B business today.
What the Splunk IPO tells us is that folks who bet on major market trends for a living have validated Big Data. The reasons are not because they are visionaries or innovators, it’s because they are pragmatic:
- Every business runs on data … it is the lifeblood of operations and strategy.
- Most have petabytes of data, some ranging into the 100’s or 1000’s.
- The amount they are spending on storage roughly equates to the amount of the GDP spent on health care … that is, it’s the biggest.
- 85% of it is currently unmanaged.
- Driving down costs, reducing risks of litigation and compliance, and mining value to create competitive advantage are on the CEO’s short list.
And as such, companies that are tuned in to these issues and offer solutions that reduce the volume, increase controls and/or uncover nuggets of gold in these massive repositories will be very valuable businesses indeed. The Internet Gold Rush did turn out to be real. So, too will the effort to create data-driven enterprises.
Summary Byte
Big Data is moving into the mainstream. Expect both the activity in the space and the value of solutions being provided to increase exponentially in the 3 to 5 years ahead. The early movers will gain the advantage, including vendors, consulting firms, and organizations IT departments. The time to act is now.
‘Ready, Fire, Aim’
… and other pitfalls of managing Big Data
When we think of all of the activity in the Big Data space, sometimes we just have to laugh at the shortsighted nature of it. Many folks jump right in to the fray as if they are ‘shooting fish in a barrel’. And, they are shocked to find that the fish are shooting back!
There is no excuse for poor preparation for tackling what may be the biggest transformative disruption in IT in our lifetime. Thought leaders and advisors to CIOs are urging them to begin now to think strategically about how to deploy an infrastructure that will enable the kinds of controls, speed and precision in managing other big shifts in IT (like client/server and the Internet). Thor Olevsrud hit it straight on in CIO Magazine:
- Thor Olevsrud, How to be Ready for Big Data; CIO Magazine
The real question he hit straight on is how do you get ready. And, this really defies conventional wisdom because most vendors are recommending strategies that drive you straight in to the two major pitfalls of managing Big Data:
“Simple, neat, and wrong.”
As readers of this post will know by now, it is my strong belief that we’ll need to adopt solutions to most of the problems associated with big data that looks the way Big Data looks. For example, the solutions themselves should be distributed in the same way that big data itself is distributed, not unlike the way we network large organizations or even the Internet itself.
By contrast, several players in the big data game seem to think we can wrap up solutions in a single container or application, which brings to mind the famous H. L. Mencken quote:
“For every problem, there is one solution which is simple, neat, and wrong.”
A very famous example is shown in the photo above. Hydrogen is simple, neat, and highly flammable — unlike helium, which is what we use today. So, the designers of the Hindenburg did a Mencken. Hydrogen is simple (hey, it’s the first element on the Periodic Table) and neat (certainly quite easy to manufacture). The “wrong” part didn’t become clear until it got too close to its mooring mast at Lakehurst NAS in 1937.
So, what’s the hydrogen in the world of Big Data?
I could give you an answer that’s simple and neat, but it would be wrong.
Rather, there are several different types of hydrogen in use today in the world of Big Data, so let’s talk about several of them.



