EMC Morphs from Storage into Big Data Analytics

Drew Robb

Updated · Jul 17, 2012

Ten years ago, EMC was a storage hardware vendor plain and simple. It offered massive disk arrays and led the market. Since then, it has widened its scope to include storage software and monitoring, and it even acquired VMware in 2004.

But can EMC become a force in business intelligence and Big Data analytics? At first blush, that seems like quite a stretch. Yet with its acquisition of Big Data vendor Greenplum in 2010 and recent developments (including a partnership with SAS in which SAS is using the EMC Greenplum data warehousing appliance to offer high-performance analytics to its customers), that appears to be where EMC is headed.

Signaling its desire to go after the Big Data analytics market, EMC earlier this year hosted its second Data Science Summit. EMC Chairman Joe Tucci recently described real time predictive analytics as “the next killer app.”

SAS and EMC Greenplum Partnership

SAS and Greenplum have been working for some time on an integrated joint development roadmap, a partnership that continued after EMC acquired Greenplum. “Our focus is allowing customers to apply the scalability, performance, flexibility, and appliance packaging of Greenplum products to the analytical power of SAS,” said Josh Klahr, Greenplum’s vice president of Product Management.

The result is a series of products designed to work in tandem. This includes SAS Access to Greenplum Database and Greenplum HD (Hadoop), SAS Scoring Accelerator and SAS Data Integration. A Greenplum appliance for SAS High-Performance Analytics (SAS HPA) aims to deliver fast and scalable analytics on very large data sets. For grid processing, the SAS Grid can be deployed alongside Greenplum’s Unified Analytics Platform (UAP). SAS Marketing Automation is also certified to run on Greenplum, as is the Banking Detailed Data Store.

“Greenplum offers technology infrastructure modernization for SAS customers by providing in-memory, in-database analytics on a massively parallel processing architecture,” said Scott Yara, a Greenplum co-founder who is now EMC’s senior vice president of Products. “This improves scalability and performance for SAS products on commodity servers.”

SAS High Performance Analytics, for instance, can move all needed data into the CPU memory of each segment processor of the Greenplum DCA nodes, thereby accelerating data access by 10 times or greater during analytical computations. As it can run on a grid of servers, Greenplum can split complex computation into many parts to run in parallel on 192 processor cores. This eliminates I/O bottlenecks found in a traditional server-based infrastructure.

“The SAS partnership makes perfect sense for EMC Greenplum, extending what they have done in the past by supporting their customers using the statistical analysis software tool with their storage and who are now leveraging the Greenplum database and storage system,” said Greg Schulz, an analyst with StorageIO Group. “It makes sense for EMC to capture revenue and business as well as customers today that are using SAS for their business analytics and data science with their Greenplum platform. And while the Hadoop market continues to evolve, EMC can talk the future while generating revenue today.”

Broader Take on Big Data

The EMC Greenplum partner ecosystem reaches beyond SAS to include business intelligence vendors Information Builders, Pentaho, MicroStrategy, Jaspersoft, SAP Business Objects, Cognos, Informatica and Tableau. As Klahr explained, Greenplum has had multiple certification programs specific to Greenplum Database in place for several years and it’s been “certified by all the third-party BI and data integration tools vendors that matter in the market.” Similarly, EMC Greenplum is obtaining certifications from vendors that provide services and support for Apache Hadoop.

Further fleshing out its Big Data strategy, in March EMC acquired Pivotal Labs, the creator of a Pivotal Tracker agile software development tool used by about a quarter of a million developers including those working at companies like Twitter, Best Buy, Groupon and Salesforce.com. EMC is using this technology to open the doors wide to Big Data analytics.

The first product utilizing the Pivotal Labs technology is Greenplum Chorus, which is billed as a platform for collaboration and analytics. Klahr said it delivers, “a Facebook-like social collaboration tool for data science teams to iterate on the development of datasets and ensure that useful insights are delivered to the business quickly.”

Greenplum has made Chorus part of its UAP. The Greenplum Database and Greenplum HD sit atop an infrastructure layer, followed by partner tools and services. Chorus is the analytic layer at the very top of the stack. The product aims to offer a single interface for all of an organization’s data, in conjunction with virtual databases and social collaboration. Klahr said you can search, explore, visualize and import structured or unstructured data from anywhere in the organization.

Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).

Drew Robb
Drew Robb

Drew Robb is a writer who has been writing about IT, engineering, and other topics. Originating from Scotland, he currently resides in Florida. Highly skilled in rapid prototyping innovative and reliable systems. He has been an editor and professional writer full-time for more than 20 years. He works as a freelancer at Enterprise Apps Today, CIO Insight and other IT publications. He is also an editor-in chief of an international engineering journal. He enjoys solving data problems and learning abstractions that will allow for better infrastructure.

More Posts By Drew Robb