Kubernetes the very hard way
Laurent Bernaille examines the lessons he learned operating large Kubernetes clusters.
Talk Title | Kubernetes the very hard way |
Speakers | Laurent Bernaille (Datadog) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build systems that drive business |
Location | Berlin, Germany |
Date | November 5-7, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Running large Kubernetes clusters is difficult. Datadog has been running large-scale Kubernetes clusters (thousands of nodes) for more than a year and has learned several lessons the hard way. Laurent Bernaille examines the challenges Datadog faced during this journey. He dives into problems that arise when you run large clusters—and, crucially, how to address them—by providing detailed examples based on Datadog’s experience across different cloud providers. You’ll explore complex runtime and networking issues: at scale you discover complex issues in low-level components that are very rare but happen regularly when you have a large number of nodes. Additionally, Laurent provides examples of how to improve the architecture of clusters to increase scalability and performance, both on the control plane and the data plane (communication between pods and ingress traffic). If scale can be hard on the control plane, it’s even harder on tools from the ecosystem, which have rarely been tested on very large clusters. He explains several examples of the tools Datadog uses and how it had to improve them to handle its scale. And you’ll leave with practical advice on how to build a good relationship with the community and start contributing back.