Apache Beam Graduates to Help Define Streaming Data Processing

Sean Michael

Updated · Jan 13, 2017

The open-source Apache Beam project hit a major milestone on Jan.10, graduating from the Apache Incubator and officially becoming a Top Level Project. Beam is a technology that provides a unified programming model for streaming as well as batch data processing.

The Apache Incubator is an entry point for new projects into the Apache Software Foundation (ASF), with graduation marking a level of maturity and adherence to established policies and processes.

“Graduation is an exciting milestone for Apache Beam,” Davor Bonaci, Vice President of Apache Beam, said in a statement. “Becoming a top-level project is a recognition of the amazing growth of the Apache Beam community, both in terms of size and diversity.”

“Together we are pushing forward the state of the art in distributed data processing and, at the same time, enhancing the ability to interconnect additional storage/messaging systems and execution engines,” Bonaci added.

The Beam project includes software development kits (SDKs) in both Python and Java that help application developers and big data analysts to define data processing pipelines that can then be executed on different engines, including Apache Spark and Google Cloud Dataflow. The most recent release of Apache Beam was the 0.4.0 update on Jan. 9, which provided support for the Apache Apex processing engine.

The original code behind Beam was actually donated to Apache by Google in early 2016, coming from Google’s Cloud Dataflow SDK.

“Though there were many motivations behind the creation of Apache Beam, the one at the heart of everything was a desire to build an open and thriving community and ecosystem around this powerful model for data processing that so many of us at Google spent years refining,” Tyler Akidau, Apache Beam PMC and Staff Software Engineer at Google, wrote in a blog post.

Among the criteria required for an incubated project to graduate is diversity of contributions. Akidau noted that since becoming part of the ASF, at least 10 Apache Beam modules were built by the community of contributors, without Google. Having a multiple non-Google contributors is a good thing in that it expands the potential uses as well as impact of the project.

“We’re ready to bring the promise of portability to programmatic data processing, much in the way SQL has done so for declarative data analysis,” Akidau wrote.

Sean Michael Kerner is a senior editor at EnterpriseAppsToday and InternetNews.com. Follow him on Twitter @TechJournalist.

Sean Michael
Sean Michael

Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

More Posts By Sean Michael