November 13, 2019

195 words 1 min read

Kubernetes Failure Stories and How to Crash Your Clusters

Kubernetes Failure Stories and How to Crash Your Clusters

Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisio …

Talk Title Kubernetes Failure Stories and How to Crash Your Clusters
Speakers Henning Jacobs (Head of Developer Productivity, Zalando SE)
Conference KubeCon + CloudNativeCon Europe
Conf Tag
Location Barcelona, Spain
Date May 19-23, 2019
URL Talk Page
Slides Talk Slides
Video

Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.

comments powered by Disqus