November 8, 2019

203 words 1 min read

Keynote: How Spotify Accidentally Deleted All its Kube Clusters with No User Impact

Keynote: How Spotify Accidentally Deleted All its Kube Clusters with No User Impact

During Spotify's Kubernetes migration, David's team deleted most of their production Kubernetes clusters. Accidentally. Twice. With little to no user impact. David shares how they recovered and learne …

Talk Title Keynote: How Spotify Accidentally Deleted All its Kube Clusters with No User Impact
Speakers David Xia (Infrastructure Engineer, Spotify)
Conference KubeCon + CloudNativeCon Europe
Conf Tag
Location Barcelona, Spain
Date May 19-23, 2019
URL Talk Page
Slides Talk Slides [Talk Slides](https://static.sched.com/hosted_files/kccnceu19/4b/KubeCon%20Europe%202019%20Keynote%20-%20David%20Xia%20-%20How%20Spotify%20Accidentally%20Deleted%20All%20Its%20Kube%20Clusters%20with%20No%20User%20Impact%20slides%20with notes.pdf)
Video

During Spotify’s Kubernetes migration, David’s team deleted most of their production Kubernetes clusters. Accidentally. Twice. With little to no user impact. David shares how they recovered and learned to operate many clusters automatically and safely. In 2017, Spotify planned the migration of hundreds of teams, thousands of services, and tens of thousands of hosts to Google Kubernetes Engine (GKE). In the last half of 2018, Spotify migrated 50 teams and hundreds of services, including critical ones, onto multiple production clusters.  David describes what led to the cluster deletions and how they barely affected users. Since the postmortem, Spotify has minimized downtime and human error by declaratively defining clusters in code with Terraform, backing up and restoring clusters with Ark, and increasing scalability and availability by running many more clusters.

comments powered by Disqus