Building DistributedLog, a high-performance replicated log service
DistributedLog is a high-performance replicated log service built on top of Apache BookKeeper that is the foundation of publish-subscribe at Twitter, serving traffic from transactional databases to real-time data analytic pipelines. Sijie Guo offers an overview of DistributedLog, detailing the technical decisions and challenges behind its creation and how it is used at Twitter.
|Talk Title||Building DistributedLog, a high-performance replicated log service|
|Speakers||Sijie Guo (StreamNative)|
|Conference||Strata + Hadoop World|
|Conf Tag||Big Data Expo|
|Location||San Jose, California|
|Date||March 29-31, 2016|
Systems like databases and messaging systems require durability. One common way to implement durability while keeping performance high is to use a log to persist updates to system state. The log is used to reconstruct the system state in the event of a crash. Moreover, logs are very powerful data structures for addressing challenging distributed-systems problems. DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system and has been used widely elsewhere at Twitter in applications from the transactional database system to the search ingestion pipeline and the real-time data analytics platform. Sijie Guo offers an overview of DistributedLog, detailing why Twitter built DistributedLog, the technical decisions and challenges behind building DistributedLog, and how Twitter uses it to support different workloads with different characteristics from a strongly consistent distributed database to a real-time data analytics pipeline. Sijie also discusses how Twitter runs the same software stack in multiple data centers to achieve global consistency.