From rivulets to rivers: Elastic stream processing in Heron

Twitter processes billions of events per day the instant the data is generated using Heron, an open source streaming engine tailored for large-scale environments. Bill Graham, Avrilia Floratau, and Ashvin Agrawal explore the techniques Heron uses to elastically scale resources in order to handle highly varying loads without sacrificing real-time performance or user experience.


Talk Title	From rivulets to rivers: Elastic stream processing in Heron
Speakers	Bill Graham (Twitter), Avrilia Floratau (Microsoft), Ashvin Agrawal (Microsoft)
Conference	Strata + Hadoop World
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 14-16, 2017
URL	Talk Page
Slides	Talk Slides
Video

Twitter is all about real time at scale. Twitter’s data centers continuously process billions of events per day the instant the data is generated. To achieve real-time performance, Twitter has developed and deployed Heron, a next-generation cloud streaming engine that provides unparalleled performance at large-scale. Heron has been successfully meeting Twitter’s strict performance requirements for various streaming applications and is now an open source project with contributors from various institutions. Bill Graham, Avrilia Floratau, and Ashvin Agrawal explore how Twitter and Microsoft have collaborated to transform Heron into a truly elastic system. The amount of data that needs to be processed in Twitter’s data centers changes significantly due to expected and unexpected global events. For example, during the Super Bowl, there are spikes of tweets that all need to be processed in real time. Similarly, unexpected events such as natural disasters can generate very large volumes of data. Twitter’s infrastructure must be robust enough to deliver real-time performance when such events take place. Heron solves this problem by supporting dynamic scaling of Heron applications, achieved by increasing or decreasing the number of Heron instances that form a topology. Providing truly elastic scaling in the context of Heron has several challenges. First, during scaling, the processing of events should not be disrupted. Heron must be able to handle running topologies while causing minimal disruption during scaling. Second, Heron has a modular and extensible architecture that allows it to integrate with various schedulers and incorporate various resource management policies. As a result, Heron’s scaling framework must be generic enough to operate efficiently on top of diverse components. Finally, Heron must be able to efficiently allocate resources during scaling so that it makes optimal use of available cluster resources. Bill, Avrilia, and Ashvin explain how Heron addresses the challenges discussed above to gracefully handle highly varying loads without sacrificing real-time performance and conclude by sharing future plans for scaling that current and future Heron contributors could tackle.

From rivulets to rivers: Elastic stream processing in Heron

Paint the landscape and secure your data center with Apache Spot

Presto: Distributed SQL on anything (sponsored by Teradata)

Stream me up, Scotty: Transitioning to the cloud using a streaming data platform

Driving enterprise open source adoption, from data lake to AI (sponsored by Teradata)

How to leverage your private cloud infrastructure to deploy Hadoop

PyTorch: A flexible and intuitive framework for deep learning