Pentaho Open Sources Big Data Capabilities with Kettle

Sean Michael

Updated Ā· Jan 30, 2012

Open source business intelligence vendor Pentaho is bringing Big Data transformation capabilities into the open source fold. Pentaho announced today the new Kettle 4.3 release, which includes new capabilities for transforming and working with Big Data.

Kettle is an Extract, Transform and Load (ETL) technology, which enables applications to take data from outside sources, transform it into a usable format and make it available for loading in a database or business intelligence application. Pentaho has had an open source edition of Kettle for several years, but previous to the new 4.3 release Big Data capabilities were only available to paying enterprise customers. Pentaho is opening up its Big Data ETL capabilities as open source now to capitalize on what it sees as a market opportunity.

“Our business model is an open core model where there are certain things that we have as value add features in an enterprise edition,” Pentaho founder Richard Daley told InternetNews.com. “What we have found over the last four quarters is that Kettle is the most popular product at Pentaho for Big Data.”

The Big Data connectors first appeared in the Enterprise edition of Kettle. Pentaho hopes to leverage the popularity of Kettle to gain an even wider audience. Kettle is now also available under an open source Apache 2.0 license, which enables it to be included and used by others. Daley described the Kettle ETL as middleware for a Big Data environment.

“It just seemed that for us to get really viral adoption that this was the right move for us,” he said. “We monetize when people look for the analytics on top with things like visualization, data discovery and predictive analytics.”

As Daley described it, Kettle goes beyond basic ETL functionality. He noted that the ability to integrate with existing data sources is a key capability. For example, Kettle can enable users to get data out of an Oracle database and prepare it for use with Hadoop‘s distributed file system.

“Today people have to write all kinds of scripts,” Daley said. “What Kettle brings to the table is it’s a visual design environment, and people can do job orchestration and workflow to help operationalize a Big Data environment.”

While the Big Data connectors are now freely available in the open source edition of Kettle, Pentaho still offers an enterprise edition as well.

“For those that want support and certified builds or additional functionality around ETL or analytics, we still have the enterprise edition,” Daley said. “But now people can choose to freely obtain the open source version and go their own way.”

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network,  the network for technology professionals. Follow him on Twitter @TechJournalist

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael