Oracle’s Big Data Powered by Cloudera’s Hadoop

Sean Michael

Updated · Jan 10, 2012

Big Data startup Cloudera is getting a big vote of confidence from Oracle. Oracle today announced the availability of the Oracle Big Data Appliance, including Cloudera’s distribution of Hadoop (CDH).

Oracle first announced the Big Data engineered system at its OpenWorld conference in October of 2011. At that time, Oracle did not reveal how they would be delivering Hadoop, which is an open source Apache project. Oracle could have chosen to go it alone, or they could have partnered with any one of a number of different Hadoop-based distributions.

“We have been talking to Cloudera for a while, trying to iron out all the details,” Cetin Ozbutun, VP of Data Warehousing Technologies at Oracle, told InternetNews.com.

Ozbutun added that the formal deal with Cloudera did not occur at the time of the OpenWorld announcement.  Cloudera, a startup founded by Christophe Bisciglia in 2008, had previously been managing Google’s Hadoop cluster. Cloudera has emerged as one of the leading Big Data vendors of Hadoop, raising millions in venture capital.

Neither Oracle nor Cloudera commented to InternetNews.com about the precise financial arrangement of the new partnership, although Kirk Dunn, COO of Cloudera, said there is a partnership between the product groups. Oracle’s Ozbutun noted that customers will acquire a single product with customer support from Oracle. He added that Oracle in turn has a deal with Cloudera to assist with support when needed.

As to why Oracle chose Cloudera, Ozbutun said, “Leaders like to work with leaders and Cloudera is a leader in Hadoop.”

The Oracle Big Data machine is a bit different than a typical Cloudera Hadoop deployment. Cloudera is a software-based distribution of Hadoop that can be deployed on hardware while the Big Data machine is what Oracle refers to as an engineered system. The Big Data Appliance can have as much as 864 GB of RAM on top of 216 CPU cores and 648 TB of storage. The system uses high-speed 40 Gb/s InfiniBand to connect the nodes inside the system, and it all runs on top of Oracle Linux.

Ozbutun said that Oracle did a lot of engineering work to optimize Hadoop for the Big Data Appliance’s hardware. He stressed that there were no code changes to Cloudera; rather it was a tuning exercise to ensure the best performance.

Moving forward, the two companies are not yet announcing any specific plans though they both are committed to further innovation in the Big Data space.

“You can imagine that with experience as we move forward you’ll find some interesting things that we’ll be able to do together,” Dunn said.

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network, the network for technology professionals. Follow him on Twitter @TechJournalist

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael