December 26, 2019

231 words 2 mins read

Cruise Control: Effortless management of Kafka clusters

Cruise Control: Effortless management of Kafka clusters

Adem Efe Gencer explains how LinkedIn alleviated the management overhead of large-scale Kafka clusters using Cruise Control.

Talk Title Cruise Control: Effortless management of Kafka clusters
Speakers Adem Efe Gencer (LinkedIn)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Francisco, California
Date March 26-28, 2019
URL Talk Page
Slides Talk Slides
Video

Kafka incurs significant management overhead. Growing cluster sizes, the increasing volume and diversity of user traffic, and the age of network and server components further contribute to this overhead. The resulting increase in the frequency of hardware failures and load imbalance leads to frequent service interruptions, leading to poor user experience. In particular, reactive mitigation becomes insufficient due to the impact on the other services that have a Kafka dependency. Getting near-optimal performance from such an infrastructure service, maintaining its availability in the face of cascading failures, and achieving these objectives with minimal management overhead are critical but nontrivial tasks. Adem Efe Gencer explains how LinkedIn alleviated the management overhead of large-scale Kafka clusters using Cruise Control. Adam begins by outlining Cruise Control’s approach to monitoring load distribution in clusters, identifying an imbalance in them, and fixing this imbalance using replica and leadership movements. He then explains how Cruise Control detects fail-stop broker failures and SLO violations without human intervention and examines a more aggressive scenario, where Cruise Control proactively identifies and mitigates potential service disruptions.

comments powered by Disqus