November 25, 2019

205 words 1 min read

How to build leakproof stream processing pipelines with Apache Kafka and Apache Spark

How to build leakproof stream processing pipelines with Apache Kafka and Apache Spark

When Kafka stream processing pipelines fail, they can leave users panicked about data loss when restarting their application. Jordan Hambleton and Guru Medasani explain how offset management provides users the ability to restore the state of the stream throughout its lifecycle, deal with unexpected failure, and improve accuracy of results.

Talk Title How to build leakproof stream processing pipelines with Apache Kafka and Apache Spark
Speakers Jordan Hambleton (Cloudera), GuruDharmateja Medasani (Domino Data Lab)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Streaming data continuously from Kafka allows users to gain insights faster, but when these pipelines fail, they can leave users panicked about data loss when restarting their application. Jordan Hambleton and Guru Medasani explain how offset management provides users the ability to restore the state of the stream throughout its lifecycle, deal with unexpected failure, and improve accuracy of results. Jordan and Guru demonstrate how Apache Spark integrates with Apache Kafka for streaming data in a distributed and scalable fashion, covering considerations and approaches for building fault-tolerant streams and detailing a few strategies of offset management to easily recover a stream and prevent data loss. Topics include:

comments powered by Disqus