Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
Talk Title | Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog |
Speakers | Matteo Merli (Streamlio), Sijie Guo (StreamNative) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 26-28, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Apache DistributedLog is a replicated log store originally developed at Twitter. It’s been used in production at Twitter for more than four years, supporting several critical services like pub/sub messaging, log replication for distributed databases, and real-time stream computing, delivering more than 1.5 trillion events (or about 17 PB) per day. Pulsar is a distributed pub/sub messaging platform that provides a flexible messaging model. Pulsar was developed at Yahoo and has been used in Yahoo Cloud Messaging Service to deliver several billions of messages per day. Both built on Apache BookKeeper, Apache DistributedLog and Pulsar are similar in design and implementation but have different goals. Matteo Merli and Sijie Guo offer an overview of both systems and share advice on how to better use them.