Pentaho Business Analytics Gets an In-Memory Boost
Updated · Nov 03, 2011
Open source business intelligence (BI) vendor Pentaho is advancing its capabilities with new in-memory features.
The Pentaho Business Analytics 4.1 release provides advanced in-memory features that enable enterprises to leverage the benefits of in-memory as well as disk-based analysis.
“Most of the other BI vendors make you choose either in-memory or on disk, and we let you choose what you want,” Ian Fyfe, chief technology evangelist and vice president of product marketing at Pentaho, told InternetNews.com. “Ultimately we have a standard relational database on the back end, but then what we do is let you take that data and load it up into memory.”
Fyfe said modern system architectures allow enterprises to access a lot more memory than what had been available in the past. As such, more data can be loaded into memory, allowing for faster analytics performance.
As opposed to a pure in-memory model, Fyfe said that the Pentaho approach is not limited by scalability issues. He noted that for performance optimization, the enterprise can choose what data gets loaded into memory and manage the data cache.
The in-memory technologies that Pentaho is using in its 4.1 release were not built by Pentaho. Rather, they’re leveraging Red Hat’s InfiniSpan as well as the memcached open source projects.
“Both of them are very popular in-memory technologies that power some of the world’s largest websites,” Fyfe said. “These caching technologies are also distributed so you can have a cluster of servers providing memory.”
The clustering also enables high-availability of the cached data. For example, there could be two copies of a data set available at all times. Fyfe noted that the system takes care of high-availability and failover too. The cache is external to Pentaho, and as such a Pentaho instance could go up or down and the cache will live on.
Pentaho has two versions of its software, an open source community edition and a commercial edition. The new advanced in-memory integration is only being made available as part of the commercial release. Fyfe said that even in the open source edition there are some limited in-memory capabilities, though limited in scale. He said the community edition is limited to a few gigabytes of memory, whereas the enterprise release can scale to the multi-terabyte range.
“That will be one of the big differentiators between our enterprise edition and our community edition,” Fyfe said. “If a company is going to scale, it’s the kind of thing where they’ll want support for their mission-critical application.”
Since both memcached and Infinispan are open source technologies, it is also possible for a developer to have enhanced in-memory support even with the open source community edition.
“A smart enough developer could probably figure out how to use Infinispan with the community version, since they have access to the source code and could probably connect it in,” Fyfe said. “But with our enterprise edition, we’ve done that work for you and it’s fully supported.”
In-memory analytics aren’t just about software. Last month, Oracle launched its new Exalytics engineered in-memory analytics system. Oracle is packing 1 TB of DRAM for main memory and has compression technology that lets it handle up to 10 TB of data. In terms of how Pentaho compares with Exalytics, Fyfe said that he was not familiar with the Oracle Exalytics product.
That said, Fyfe did say that Pentaho works on commodity hardware, which is not out-of-reach for many enterprises.
“It’s commodity Intel hardware with Linux and memory is getting cheaper all the time,” Fyfe said. “So it’s not hard to scale up a nice machine that is not outrageously expensive.”