December 31, 2019

250 words 2 mins read

Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog

Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog

Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.

Talk Title Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog
Speakers Matteo Merli (Streamlio), Sijie Guo (StreamNative)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 26-28, 2017
URL Talk Page
Slides Talk Slides
Video

Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Apache DistributedLog is a replicated log store originally developed at Twitter. It’s been used in production at Twitter for more than four years, supporting several critical services like pub/sub messaging, log replication for distributed databases, and real-time stream computing, delivering more than 1.5 trillion events (or about 17 PB) per day. Pulsar is a distributed pub/sub messaging platform that provides a flexible messaging model. Pulsar was developed at Yahoo and has been used in Yahoo Cloud Messaging Service to deliver several billions of messages per day. Both built on Apache BookKeeper, Apache DistributedLog and Pulsar are similar in design and implementation but have different goals. Matteo Merli and Sijie Guo offer an overview of both systems and share advice on how to better use them.

comments powered by Disqus