Kafka in jail: Running Kafka in container-orchestrated clusters

Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters.


Talk Title	Kafka in jail: Running Kafka in container-orchestrated clusters
Speakers	Sean Glover (Lightbend)
Conference	Strata Data Conference
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	May 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Kafka is best suited to run close to the metal on dedicated machines in statically defined clusters, but these fixed clusters are quickly becoming extinct. Companies want to create mixed-use clusters that take advantage of every resource available. Stateless, transient services fit well into this model, but stateful services each have their own particular needs. Disk is one of Kafka’s most important resource requirements to provide message durability, but what is the best way to provide disk resources to stateful technologies while in a mixed-use cluster? Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters and detail the pros and cons of containerizing Kafka brokers relative to installing directly on the host platform. Static clusters require greater operational know-how to do common tasks with Kafka, such as applying broker configuration updates, upgrading to a new version, and adding or decommissioning brokers. By using Kafka implementations on DC/OS (Apache Mesos) and Kubernetes, you can reduce the overhead for a number of common operational tasks with standard cluster resource manager features. You’ll learn how to accommodate for Kafka-specific operational logic in the form of a Kafka cluster helper application known as a scheduler in Mesos and controller in Kubernetes and discover some of the pitfalls of such an approach, including how to manage broker storage effectively and the additional burden of monitoring scheduler or controller-based Kafka cluster help applications. You also learn how to use modern container orchestration tooling to find the right balance between statically defined clusters and elasticity within a larger mixed-use clusters. In mixed-use clusters, best practice often dictates that stateful applications are sticky to the host they’re running on because that state exists on the local disk. However there may be scenarios where using a distributed block storage solution may be acceptable, which would allow brokers to have some sense of mobility when there’s a need. Sean outlines the implications of using distributed block storage devices and the performance trade-offs in common failure or operational scenarios, such as when a broker needs to be replaced and topic partitions must be rebalanced. Kafka is an integral part of the Lightbend Fast Data Platform, the next-generation stream processing system. Join in to see how to best implement operational Kafka with container orchestration tools on public cloud services.

Kafka in jail: Running Kafka in container-orchestrated clusters

Scalable Monitoring Using Prometheus with Apache Spark

Apache Kafka + Apache Mesos = Highly scalable streaming microservices

Distributed deep learning with containers on heterogeneous GPU clusters

The Path to GPU as a Service in Kubernetes

Container Storage Interface: Present and Future

Keynote: Shaping the Cloud Native Future