Hadoop Evolution: What You Need to Know: Page 2
"Enterprises are faced with a lot of complex choices: What's the right technology option for this use case, which vendors are going to be the most viable, and will I have the skills to actually run this stuff at scale," Heudecker said. "In the short term, you're going to see a lot of companies kicking the tires on the cloud and they may be looking at platform-as-a-service vendors to bridge the skills gap."
Enterprises also should be aware that these tools can sometimes come with major limitations, according to Pentaho CTO and founder James Dixon.
"It's not all at the same level of maturity, sophistication, completeness," Dixon said during a recent interview. "Some of these capabilities weren't built into the design of the software in the first place, so you may find there are major limitations with these new features because you're taking a technology that just wasn't designed to do that. So I would say be very cautious of using the new whiz-bang features that suddenly arise. Be very cautious of those because they're not mature, they weren't designed in, so there may be architectural design flaws or just major limitations that you're not aware of."
Not Just a Big Cluster
There's also a key shift in how vendors and analysts see Hadoop's role in the enterprise. It's no longer about dumping all the data into one huge data lake -- although data lakes do have a role as archives and sandboxes, experts say. Instead, it's about "connections, not collections," Heudecker said.
"The trend has been to collect a bunch of data together and then analyze it. That's expensive and it's hard to do," he said. "I think it's much easier to leave the data where it is and do your consolidation logically with metadata. So you're leaving data in its legacy store, and you're saying, 'Alright, let me bring in what I need, I'll build out my analysis and then do push-down processing to the relevant platform.' It's much more advanced, and very early, but I think that's a more viable strategy than saying let's consolidate everything into this big cluster."
Emerging Approaches to Hadoop
Enterprises can expect to see a similar message from vendors as new offerings come to market. Pentaho's recent Business Analytics 6.1 release supports Heudecker's and Russom's observations. Pentaho is a data integration and data analytics company, but the new release's big boast is metadata injection for Big Data in Hadoop and traditional environments.
When discussing MapR's new Converged Data Platform, Norris pointed out it's not just about pooling data, but reaching data where it lives and incorporating it into business decisions.
"It is very different than what's possible with Apache Hadoop alone without that kind of converged data platform," Norris said. "The companies that are really getting the biggest payoff from their investments in data are the ones incorporating it into the business flow; so things like performing billions of transactions or billions of events a day."
Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.