Kubernetes the very hard way

Laurent Bernaille examines the lessons he learned operating large Kubernetes clusters.


Talk Title	Kubernetes the very hard way
Speakers	Laurent Bernaille (Datadog)
Conference	O’Reilly Velocity Conference
Conf Tag	Build systems that drive business
Location	Berlin, Germany
Date	November 5-7, 2019
URL	Talk Page
Slides	Talk Slides
Video

Running large Kubernetes clusters is difficult. Datadog has been running large-scale Kubernetes clusters (thousands of nodes) for more than a year and has learned several lessons the hard way. Laurent Bernaille examines the challenges Datadog faced during this journey. He dives into problems that arise when you run large clusters—and, crucially, how to address them—by providing detailed examples based on Datadog’s experience across different cloud providers. You’ll explore complex runtime and networking issues: at scale you discover complex issues in low-level components that are very rare but happen regularly when you have a large number of nodes. Additionally, Laurent provides examples of how to improve the architecture of clusters to increase scalability and performance, both on the control plane and the data plane (communication between pods and ingress traffic). If scale can be hard on the control plane, it’s even harder on tools from the ecosystem, which have rarely been tested on very large clusters. He explains several examples of the tools Datadog uses and how it had to improve them to handle its scale. And you’ll leave with practical advice on how to build a good relationship with the community and start contributing back.

Kubernetes the very hard way

Low Latency Multi-cluster Kubernetes Networking in AWS

Liberating Kubernetes From Kube-proxy and Iptables

Kubernetes Networking at Scale

Connecting Kubernetes Clusters Across Clouds With Kilo

VPP Accelerated High Performance & Scalable L3DSR L4 Load Balancer on Top Clos

Five Things You Didnt Know You Could Do with SPIFFE and SPIRE