November 24, 2019

371 words 2 mins read

Kafka streaming applications with Akka Streams and Kafka Streams

Kafka streaming applications with Akka Streams and Kafka Streams

Dean Wampler compares and contrasts data processing with Akka Streams and Kafka Streams, microservice streaming applications based on Kafka. Dean discusses the strengths and weaknesses of each tool for particular design needs and contrasts them with Spark Streaming and Flink, so you'll know when to choose them instead.

Talk Title Kafka streaming applications with Akka Streams and Kafka Streams
Speakers Dean Wampler (Anyscale)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics. With powerful stream and table abstractions and an exactly once capability, it supports a variety of common scenarios involving transformation, filtering, and aggregation. Akka Streams emerged as a dataflow-centric abstraction for the Akka Actors model, designed for general-purpose microservices, especially when per-event low-latency is important. Most systems provide efficient processing amortized over sets of records, but usually not at end-to-end low latency per event (e.g., for complex event processing in true real-time applications). Also because of its general-purpose nature, Akka Streams supports a wider class of application problems and third-party integrations but is less focused on Kafka-based applications. Both are primarily libraries that you integrate into your microservices, which means you must manage their lifecycles yourself, but you also get lots of flexibility to do this as you see fit. In contrast, Spark Streaming and Flink run their own services. You write “jobs” or use interactive shells that tell these services what computations to do over data sources and where to send results. Spark and Flink then determine what processes to run in your cluster to implement the dataflows. Hence, there is less of a DevOps burden to bear but also less flexibility when you might need it. Both systems are also more focused on data analytics problems, with various levels of support for SQL over streams, machine learning model training and scoring, etc. Dean Wampler compares and contrasts data processing with Akka Streams and Kafka Streams, discussing the strengths and weaknesses of each tool for particular design needs and contrasting them with Spark Streaming and Flink, so you’ll know when to choose them instead.

comments powered by Disqus