Cloudera Intros Hadoop Real Time Query Engine for Big Data

Sean Michael

Updated · Oct 24, 2012

The Apache Hadoop project is at the core of the Big Data revolution. One of the key enablers for Hadoop is a technology known as MapReduce that is very powerful – but also slow.

Cloudera, one of the leading commercial sponsors of Hadoop, is now aiming to enable faster Big Data queries by introducing a new technology codenamed Impala. The goal with Impala is to enable rapid and interactive queries.

Cloudera CEO Mike Olson said MapReduce was originally designed by consumer Internet companies to process large-scale, batch data workloads.

“If you want to get at your data for interactive queries, you just can’t get there with MapReduce,” he said. “That means that Hadoop just doesn’t get deployed for a whole bunch of workloads.”

How to Speed up Hadoop

Impala doesn’t replace MapReduce, Olson noted. “What we have done is added another execution framework – another way to get at the identical data in a Hadoop cluster. Customers can transform and analyze data with MapReduce and they can query the results using Impala.”

Impala is also complementary to the SQOOP SQL database technology that Cloudera first released in 2009.

“SQOOP is a way to move data between a relational database and Hadoop,” Olson said. “With Impala, you can now get the same interactive query speeds that you would expect with a relational database.”

The Impala technology is being made available today as a public beta under the open source Apache license. The plan is for Impala to be part of the Cloudera Distribution of Hadoop (CDH) version 4.5 in the first quarter of 2013.

“We’re not politically committed to open source,” Olson said. “We just believe that open source is a better way to develop platform software, and it’s the way customers of ours want to consume the platform.”

The last major release of CDH, CDH 4.0, debuted in June of this year, providing enterprise-grade stability features. Olson explained that Cloudera has quarterly point releases to update the platform. The next major release, CDH 5.0, is currently scheduled for the middle of 2013.

“We’re not yet announcing the key features of CDH 5, but it’s mostly about more enterprise grade features for our installed customer base,” Olson said. “Impala as an addition to the platform is non-disruptive, so we can roll it into one of our point releases.”

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network. Follow him on Twitter @TechJournalist.

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael