Pentaho 4.5 Visualizes Big Data Analytics

Sean Michael

Updated · Apr 26, 2012

Big Data is all about finding the needle in the haystack. One of the ways to make that data more relevant and usable is with a data analytics visualization tool that enables companies to more easily envision what the data is saying.

Open source business intelligence vendor Pentaho is updating its namesake software platform to version 4.5, with the specific aim of enabling user-driven data visualization.

Ian Fyfe, chief technology evangelist and vice president of product marketing at Pentaho, told InternetNews.com that in Pentaho 4.5 all of the different visualization types have been made interactive. As such, users can mouse-over and filter visualization across multiple different types of charts.

“It’s really aimed at the end-user, so without needing hand-holding from IT, they can easily visualize data in new ways,” Fyfe said.

Helping to power the new visualization capabilities are improvements to the company’s in-memory capabilities first introduced in November with the Pentaho Business Analytics 4.1 release. That system leverages the Infinispan open source project to help accelerate data visualization. In the 4.5 release, Fyfe said Pentaho is making more efficient use of the in-memory cache.

“The idea is you can take data that is sitting in a database, load it into the in-memory cache and it will deliver extreme split second performance, without needing to hit the database every time you interact with the application,” he said.

Big Data visualization is enhanced in Pentaho 4.5 with support for the Apache Cassandra and MongoDB NoSQL databases. Existing support for Hadoop has gone even deeper by improving operations for MapReduce, a programming framework for Hadoop for parallel data processing. Pentaho has its own visual interface for MapReduce that is intended to make it easier to write and run MapReduce operations.

“So instead of having to hand code Java, you can use the Pentaho visual interface with a point-and-click functionality for MapReduce,” Fyfe said. “The MapReduce job can be run as part of the Pentaho Data Integration Engine across the Hadoop cluster.”

Pentaho 4.5 also offers support for the Hadoop distributed cache that enables massive scalability for Big Data.

“In the past, you would have to manually install us on every node in a Hadoop cluster,” Fyfe said. “In 4.5, with support for the Hadoop distributed process, we can automate that process.”

As such, Pentaho only needs to be installed once and then Hadoop takes care of the distribution across the cluster.

The Pentaho Business Analytics 4.5 release is available in both enterprise and community open source editions. Fyfe noted that the enterprise release is built on top of the community one. The open source edition includes all of the core Big Data functionality, though the new visualizations are found only in the commercial edition.

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network,  the network for technology professionals Follow him on Twitter @TechJournalist.

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael