VMware Open Source Project Serengeti brings Hadoop to Virtualization

Sean Michael

Updated · Jun 13, 2012

The open source Hadoop project is getting a real boost today from virtualization vendor VMware. Getting Hadoop up on running on infrastructure, real or virtual, and then layering in traditional application availability and controls has been no easy task.

VMware today took the wraps off Project Serengeti, an open source effort that enables Hadoop to run on virtual infrastructure. Additionally, VMware is also contributing open source bits to the upstream Apache Hadoop project, to enable Hadoop to be more responsive and efficient when running on virtualization technologies.

“Project Serengeti is about simplifying deployment of Hadoop on VMware,” Fausto Ibarra, senior director of Product Management at VMware, told InternetNews.com. “With Serengeti you can have a fully functional Hadoop deployment in as little as 10 minutes.”

With Serengeti, Hadoop can run on a regular VMware vSphere deployment and then take advantage of all the usual VMware management and availability tools. Hadoop itself is packaged as a standard OVF (Open Virtualization Format) machine image file, just like any other VMware image.

The way Serengeti is deployed on vSphere is that the OVF will create two virtual machines. One is the Serengeti server; the other is the master virtual machine that will be cloned to create Hadoop nodes.

“The user connects into the Serengeti Server virtual machine, and that's where they run all the commands to create a cluster,” Ibarra said. “The master virtual machine is then cloned and configured, and after that you have a fully functionally Hadoop cluster.”

Serengeti also provides configuration and management capabilities for Hadoop. While there are other vendors in the market, including Cloudera with its Cloudera Manager, Ibarra doesn't see Serengeti as being competition for them. He stressed that Serengeti is for virtual infrastructure, while Cloudera Manager is focused on physical.

He added that VMware is also partnering with Cloudera to help enable the CDH distribution of Hadoop to run on Serengeti. Cloudera recently updated CDH to version 4, providing new performance and scalability features.

Moving forward, Ibarra noted that the overall direction for VMware with Serengeti is to bring Hadoop closer to the cloud.

Serengeti is currently available on the open source Github repository and as a free download for vSphere  at https://github.com/vmware-serengeti/serengeti-ws



Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network, the network for technology professionals Follow him on Twitter @TechJournalist.



  • Data Management
  • News
  • Sean Michael
    Sean Michael

    Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

    Read next