Kubernetes Failure Stories and How to Crash Your Clusters
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisio …
Talk Title | Kubernetes Failure Stories and How to Crash Your Clusters |
Speakers | Henning Jacobs (Head of Developer Productivity, Zalando SE) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Barcelona, Spain |
Date | May 19-23, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.