November 29, 2019

201 words 1 min read

Unified stateful big data processing in Apache Beam (incubating)

Unified stateful big data processing in Apache Beam (incubating)

Apache Beam's new State API brings scalability and consistency to fine-grained stateful processing while remaining portable to any Beam runner. Aljoscha Krettek introduces the new state and timer features in Beam and shows how to use them to express common real-world use cases in a backend-agnostic manner.

Talk Title Unified stateful big data processing in Apache Beam (incubating)
Speakers Aljoscha Krettek (Ververica)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

Apache Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines, but not all use cases are pipelines of simple “map” and “combine” operations. Aljoscha Krettek introduces Beam’s new State API, which brings scalability and consistency to fine-grained stateful processing while interoperating with Beam’s other features such as consistent event-time windowing and windowed side inputs—all while remaining portable to any Beam runner, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Aljoscha covers the new state and timer features in Beam and shows how to use them to express common real-world use cases in a backend-agnostic manner. Examples of new use cases unlocked by Beam’s new mutable state and timers include:

comments powered by Disqus