November 26, 2019

284 words 2 mins read

Effectively once in Apache Pulsar, the next-generation messaging system

Effectively once in Apache Pulsar, the next-generation messaging system

Traditionally, messaging systems have offered at-least-once delivery semantics, leaving the task of implementing idempotent processing to the application developers. Matteo Merli explains how to add effectively once semantics to Apache Pulsar using a message deduplication layer that can ensure those stricter semantics with guaranteed accuracy and no performance penalty.

Talk Title Effectively once in Apache Pulsar, the next-generation messaging system
Speakers Matteo Merli (Streamlio)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Apache Pulsar is a distributed pub/sub messaging system that originated at Yahoo, where it has been powering critical systems and user-facing products for several years. The traditional API for Pulsar is derived from basic pub/sub concepts, such as subscriptions and consumers that receive messages and acknowledge their processing. This model is very simple yet powerful in that it allows you to build applications without needing to understand the underlying intricacies of the messaging system. The only drawback is that previous pub/sub system offer only at-least-once semantics, leaving the task of eliminating duplicated messages to the application. Since the emergence of stream processing and more demanding requirements, messaging systems need to offer correct primitives to allow implementing effectively once semantics end to end, in both the messaging layer and the processing layer. In this context, “effectively once” means that messages can actually be replayed multiple times in the presence of failure, though the effects of their processing will be equivalent to exactly once. Matteo Merli explores the new APIs introduced in Pulsar to offer effectively once semantics, discusses the implementation details and performance testing results, and shares use cases that can greatly benefit from this new feature.

comments powered by Disqus