Apache Hadoop 3.0.0 Boosts Big Data App Ecosystem

Sean Michael

Updated · Dec 22, 2017

In the world of big data, one project has long loomed larger than all the rest — Hadoop. The open source Apache Hadoop project provides the core framework on which dozens of other big data efforts rely.

The Apache Hadoop v.3.0.0 milestone became generally available on Dec. 14, marking the first major version change for the project since Hadoop 2 debuted in 2013.

“Hadoop 3 is a major milestone for the project, and our biggest release ever,” Andrew Wang, Apache Hadoop 3 release manager, stated.

YARN

The big new feature that was added for Hadoop 2 is once again at the top of the list for Hadoop 3. YARN is an acronym for “Yet Another Resource Negotiator,” and it enabled Hadoop to handle distributed resource management and processing.

In Hadoop v3.0.0, YARN now benefits from a new federation capabilities with the HDFS storage system, which will enable clusters to scale up to tens of thousands of systems. Additionally, Yarn now supports more resource types, including GPUs, to help enable machine learning and artificial intelligence workloads.

The YARN timeline service is also getting a boost in Hadoop v.3.0.0, providing improved scalability.

“In many cases, users are interested in information at the level of ‘flows’ or logical groups of YARN applications,” the Hadoop project documentsion on the Timeline service states. “It is much more common to launch a set or series of YARN applications to complete a logical application. Timeline Service v.2 supports the notion of flows explicitly. In addition, it supports aggregating metrics at the flow level.”

Over the last several years, an increasing amount of all application workloads have been migrated to containers. It’s a trend that Hadoop 3.0.0 is prepared for and supports. Among the new features in the Hadoop 3 milestone is opportunistic container execution, which is a capability that aims to improve resource utilization for short-lived containers.

YARN is also getting in on the container improvements, with support for distributed scheduling of opportunistic containers.

“It’s tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today’s release,” Doug Cutting, original co-creator of Apache Hadoop, stated. “With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.  The open source community continues to respond to industrial demands.”

Sean Michael Kerner is a senior editor at Enterprise Apps Today and InternetNews.com. Follow him on Twitter @TechJournalist.

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael