The Blind Men and the Elephant
Many are familiar with the old story about the Blind Men and the Elephant. In various versions of the tale, a group of blind men (or men in the dark) touch an elephant to learn what it is like. Each one feels a different part, but only one part, such as the side or the tusk. They then compare notes.
They conclude that the elephant is like a wall, snake, spear, tree, fan or rope, depending upon where they touch. They have a heated debate that does not come to physical violence, but they learn they are in complete disagreement, and the conflict is never resolved.
There’s an even more recent version of this story, but it involves IT Service Management. This story ends happily because the six men decide to rely on a Configuration Management Database, or CMDB.
Turns out the IT industry had a very similar problem as the six guys above. They each were responsible for different parts of an organization’s elephant, er, IT environment. One of their biggest problems was that each guy, using different tools, or having a different focus, or being responsible for different parts of the process, would end up with different and inconsistent views of what the IT environment really looked like.
So the guys who invented ITIL figured out (correctly, I might add) that the only way out of the problem was to include something called the CMDB. Without getting too technical, a true CMDB is a representation of a set of current and historical relationships between configuration items (the “atoms” of an IT environment). And as long as each of the guys keeps the CMDB up-to-date, nobody ends up being confused.
In doing what they do in a real IT environment, CMDBs must manage the issues of CI reconciliation, synchronization, mapping, and visualization. Depending on the scale of the environment, CMDBs can be implemented using simple database technology, or in large environments it may make more sense to federate multiple definitive data sources, and manage the issues of CI reconciliation, synchronization, mapping, and visualization through federation. The goal in any case is to have a consistent set of services that deliver authoritative information to every piece of the IT management environment.
What on earth do elephants and CMDBs have to do with Big Data?
Plenty, as it turns out.
And if you’re in the process of looking at any solutions to any of your Big Data problems, you want to keep reading. You’ll have a wonderful opportunity to learn just how relevant both of these stories really are.
Let’s assume your data is the elephant. And like the six guys, you’re probably blind – you have no idea what all your data looks like or even where it is.
So what can you do? Simple. You can do exactly one of two things.
Like the guys in the elephant story, you can assemble a whole pile of disjoint tools that have their own individual views of what your data looks like or where it is: search, archival, e-discovery, records management, data governance, business analytics.
Each tool may be quite happy thinking it knows what your data looks like, but the tools will never agree between themselves on what it all looks like. And then you’ll either:
- learn to live with inconsistent views of data, or
- move the data to another place to generate yet another index, or
- throw people at the problem.
How do I know this? Because it’s exactly what IT organizations did before they had a CMDB.
The other thing you can do is adopt the CMDB approach to the management of data. In the world of Big Data, it’s the index. And that’s where you should begin.
One indexing system. For search, archival, e-discovery, records management, data governance, business analytics. The works. 
If you’re assembling your own set of tools (such as the set above), each of the tools you use to help you manage your data should use that single indexing system or use common services built on top of that single indexing system, just like ITIL and the CMDB.
If you’re buying a set of tools from a vendor, you should be exceedingly wary of vendors offering solutions built from a bunch of different pieces of acquired technology.
Why? Because if they were separate pieces once, they were each originally built with their own indexing technology. You’ve now just purchased your elephant rather than building your elephant yourself, but you’re still just as blind.
Why? Because unless the vendor assembling these pieces integrated them together with a single indexing system, those individual indexes are alive and well and living under the covers, waiting to mislead you about what the elephant really looks like.
You must not let each tool in your Big Data arsenal try to create or use its own index, or you’ll never get a consistent understanding of your elephant, er, data. Don’t fall for the ruse; your data will love you for it.
