October 26, 2019

215 words 2 mins read

Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka

Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka

Alex Silva outlines the implementation of a real-time analytics platform using microservices and a Scala stack that includes Kafka, Spark Streaming, Spray, and Akka. This infrastructure can process vast amounts of streaming data, ranging from video events to clickstreams and logs. The result is a powerful real-time data pipeline capable of flexible data ingestion and fast analysis.

Talk Title Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka
Speakers Alex Silva (Pluralsight)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 29-31, 2016
URL Talk Page
Slides Talk Slides
Video

With the advent of reliable streaming technologies, real-time data pipelines have become a crucial component of any robust data initiative today. Compared to a traditional Hadoop-centric data hub, these real-time stacks provide high-levels of system availability and data integrity coupled with very low latency queries without incurring the overhead of inflexible schemas and batch analysis lag. Alex Silva demonstrates how to use Kafka, Spark Streaming, Akka, and Hadoop to orchestrate a real-time stack and explains how data flows through this system. This real-time data platform combines a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution, starting from flexible data-ingestion protocols to fast data analysis and queries. Topics include:

comments powered by Disqus