December 23, 2019

370 words 2 mins read

Live Aggregators: A scalable, cost-effective, and reliable way of aggregating billions of messages in real time

Live Aggregators: A scalable, cost-effective, and reliable way of aggregating billions of messages in real time

Osman Sarood and Chunky Gupta discuss Mists real-time data pipeline, focusing on Live Aggregators (LA)a highly reliable and scalable in-house real-time aggregation system that can autoscale for sudden changes in load. LA is 80% cheaper than competing streaming solutions due to running over AWS Spot Instances and having 70% CPU utilization.

Talk Title Live Aggregators: A scalable, cost-effective, and reliable way of aggregating billions of messages in real time
Speakers Osman Sarood (Mist Systems), Chunky Gupta (Mist Systems)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Francisco, California
Date March 26-28, 2019
URL Talk Page
Slides Talk Slides
Video

Osman Sarood and Chunky Gupta discuss Mist’s real-time data pipeline, focusing on Live Aggregators (LA)—a highly reliable and scalable in-house real-time aggregation system that can autoscale for sudden changes in load, ensuring fault tolerance and scalability. LA consumes billions of messages a day from Kafka with a memory footprint of over 1 TB and aggregates over 100 million time series. Since it runs entirely on top of AWS Spot Instances, it’s highly reliable. LA can recover from hours-long EC2 outages using its checkpointing mechanism, which recovers the checkpoint from S3 and replays messages from Kafka where it left off, ensuring no data loss. LA writes the aggregated data to the configured storage system (either be Cassandra, S3 or Kafka). LA does over 1.5 billion writes to Cassandra per day and maintains over 100 million concurrent state machines. The characteristic that sets LA apart is its ability to autoscale by intelligently learning about resource usage (both seasonal and long-term trends) and allocating resources accordingly. LA emits custom metrics that track resource usage for different components, such as Kafka consumers and shared memory managers and aggregators, to achieve server utilization of over 70%. LA is horizontally scalable to thousands of cores. Mist also does multilevel aggregations in LA to intelligently solve load imbalance issues among different partitions for a Kafka topic. Osman and Chunky demonstrate multilevel aggregation using an example that aggregates indoor location data coming from different organizations both spatially and temporally. You’ll learn how changing partitioning keys and writing intermediate data back to Kafka in a new topic for the next level aggregators help Mist scale its solution.

comments powered by Disqus