No Hadoop Expert? No Problem
Many enterprises can use Hadoop in tandem with traditional data warehouses and may not need to worry about hiring a staff of Hadoop experts, says Wayne Kernochan.
Anecdotal evidence and analyst firm surveys suggest a big surge in demand for Hadoop experts. In fact, the demand far outstrips the supply. So what should IT do if the requisite experts aren't available?
First, let's consider the reason for Hadoop expert demand. For many firms, Hadoop is a gateway to key social media data about customers residing on public clouds. In these cases, Hadoop expertise is needed to code the programs to access this data; combining it with in-house customer data once it arrives does not require such expertise.
A newer case popular among many IT shops is to set up an in-house Hadoop data store for internal data, plus MapReduce, optionally Hive, and the necessary programs to create a data warehouse or querying appliance. There seem to be two sub-cases: trying to use Hadoop as most or all of the enterprise's data warehouse, and just focusing on particular use cases.
Initial examination of cost-benefit suggests that using traditional techniques for data warehousing and Hadoop for masses of easy-to-search Big Data that requires relatively few updates over time is the most cost effective way to go.
What to Do Without Hadoop Hand-holding
The first case, remote Hadoop access, is surprisingly easy to solve. Multiple vendors provide "hooks" in their databases that allow SQL-like access to Hadoop stores in public clouds. Data virtualization tools, do this for most databases, and can combine the results in a common format as well. In fact, a data virtualization tool does much of the cross-optimization between Hadoop stores and between public Hadoop and in-house Big Data as well.
It would seem that Hadoop expertise is required to apply Hadoop to an appropriate use case. However, if no such expertise is available by hook or crook, then spending the extra money and doing it via a traditional data warehouse can still be done – because recently, startling advances in price-performance have been made by technologies such as IBM BLU Acceleration using DB2.
In other words, it may turn out that the new technology allows you to be just as cost-effective using a traditional data warehouse. If not, there are tools that hide the programming complexity to some extent, like the aforementioned Hive.
For those thinking of using Hadoop for all your data warehousing needs, I would strongly recommend thinking again. If you are dead set against a traditional data warehouse, then try using a data virtualization tool to create a "virtual data warehouse" across existing databases -- including those for "mixed" and OLTP (online transaction processing) workloads, or in concert with a master data management solution. Keep in mind that the new database technologies cited above also make these easier to implement and are more cost effective than a pure Hadoop data warehouse.
Hadoop Shouldn't Prevent You From Doing What's Right
Please remember that Hadoop was created to handle certain specific public cloud cases where people simply could not wait for consistent and cleansed humongous Big Data. It was never intended for general cases and less-than-petabyte needs.
If you don't let the Hadoop hype get to you, it's perfectly possible to use it for this kind of enterprise need, and without Hadoop experts. This is a good thing, because the scarcity of such experts is likely to last for a while.
The main reason for high Hadoop implementation costs? Complex programs to achieve what traditional database programs achieve without effort -- and complexity means a long programmer learning curve. If it were me, I wouldn't wait.
Wayne Kernochan is the president of Infostructure Associates, an affiliate of Valley View Ventures that aims to identify ways for businesses to leverage information for innovation and competitive advantage. Wayne has been an IT industry analyst for 22 years. During that time, he has focused on analytics, databases, development tools and middleware, and ways to measure their effectiveness, such as TCO, ROI, and agility measures. He has worked for respected firms such as Yankee Group, Aberdeen Group and Illuminata, and has helped craft marketing strategies based on competitive intelligence for vendors ranging from Progress Software to IBM.