December 1, 2019

264 words 2 mins read

Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale

Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale

Vodafone UKs new SIEM system relies on Apache Flume and Apache Kafka to ingest over 1 million events per second. Tristan Stevens discusses the architecture, deployment, and performance-tuning techniques that enable the system to perform at IoT-scale on modest hardware and at a very low cost.

Talk Title Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale
Speakers Tristan Stevens (Cloudera)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

There are two large obstacles to collecting metadata from a network as large as Vodafone’s (the UK’s second-largest telecoms provider): transporting the sheer volume of data (cumulative bandwidth) and processing it before the data no longer accurately reflects the state of the network (cumulative delay). Fortunately, combining Apache Flume and Apache Kafka using the Flafka pattern provides a means to move data into the EDH (Hadoop cluster) and readily scale the pipeline to address both transient and persistent spikes in data volume. Flume and Kafka are both capable of high-performance, low-latency event processing; however, careful tuning is required in order to achieve performance at this scale. Vodafone has deployed Flume and Kafka across the UK network in a geographically distributed architecture that achieves scale and resilience, having been tuned from around 10,000 events per second on initial deployment to 1,000,000 events per second using a three-node Kafka cluster. Tristan Stevens discusses the architecture, deployment, and performance-tuning techniques that enable the system to perform at IoT-scale on modest hardware and at a very low cost. Topics include:

comments powered by Disqus