Pentaho Brings Business Intelligence to Hadoop

Paul Ferrill

Updated · Oct 15, 2010

Open source business intelligence company Pentaho unveiled BI and data integration tools for Hadoop this week, but they aren't available to users of the free community edition of Pentaho.

The new offerings, unveiled at Hadoop World this week, make Hadoop easier to use for companies trying to solve Big Data challenges, Pentaho says. While Hadoop's open source distributed application framework offers promise for making sense of vast amounts of data stored in enterprises, Pentaho claims it has solved the biggest challenge for Hadoop users by reducing the technical learning curve, the need for specialized staff and the lack of development and deployment applications for data integration and business intelligence.

Pentaho Data Integration (PDI) for Hadoop offers a zero-programming graphical design environment so organizations can easily manage how data is moved into and out of Hadoop, execute and schedule Hadoop tasks in the context of existing ETL and BI workflows, and design and execute massively scalable ETL jobs in Hadoop using more than 200 out-of-the-box ETL steps.

Pentaho also promises easy integration with cloud deployments in Amazon Elastic MapReduce, Cloudera Distribution for Hadoop (CDH) and Apache Hadoop.

The Pentaho BI Suite for Hadoop includes PDI for Hadoop. Users can perform production, operational and batch reporting against the full set of data in Hadoop using Hadoop's Hive data warehouse infrastructure, and ad hoc reporting can be performed against data in Hadoop with zero knowledge of Hadoop or SQL, the company says. Users can also spin off high-performance data marts in minutes for interactive analysis and dashboarding using Pentaho Agile BI, the company says.

“Pentaho just lowered the onramp to Big Data analytics by making it easier and more affordable for companies to get up and running with Hadoop,” Shawn Rogers, research vice president for business intelligence at analyst firm Enterprise Management Associates, said in a statement. “It's an essential tool set addition for senior level architects and others at larger organizations with Big Data initiatives, or even for a DBA or ETL guy trying to get into Hadoop.”

Pentaho describes the new offerings as “a collaborative effort from both Pentaho Corporation and the Pentaho community,” and said the tools were put through “an extensive beta program that involved both Pentaho community members and commercial customers.” The company claims it is the first solution to address user needs for ETL and BI applications that make Hadoop easier to use for Big Data analytics. Pentaho also contributed a number of improvements to open source projects in the Hadoop ecosystem, including Apache Hive and VFS.

But the Pentaho Data Integration and BI tools for Hadoop aren't available for open source Pentaho users; they are only being released in enterprise editions, which include full functionality and technical support. However, there is some basic Hadoop functionality within the Pentaho open source projects, the company says.

Pentaho Data Integration and BI for Hadoop are available for a 30-day free trial at

Follow eCRMguide on Twitter

  • Business Intelligence
  • Data Management
  • News
  • Paul Ferrill
    Paul Ferrill

    Paul Ferrill has been writing for over 15 years about computers and network technology. He holds a BS in Electrical Engineering as well as a MS in Electrical Engineering. He is a regular contributor to the computer trade press. He has a specialization in complex data analysis and storage. He has written hundreds of articles and two books for various outlets over the years. His articles have appeared in Enterprise Apps Today and InfoWorld, Network World, PC Magazine, Forbes, and many other publications.

    Read next