Stories from the Playbook
Have you ever wondered how GKE Site Reliability Engineers (SRE) manage an entire fleet of GKE clusters in 15 regions around the world? This talk provides an overview on how the SRE team approach this …
Talk Title | Stories from the Playbook |
Speakers | Fred van den Driessche (Site Reliability Engineer, Google), Tina Zhang (Site Reliability Engineer, Google) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Copenhagen, Denmark |
Date | Apr 30-May 4, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Have you ever wondered how GKE Site Reliability Engineers (SRE) manage an entire fleet of GKE clusters in 15 regions around the world? This talk provides an overview on how the SRE team approach this challenge, what tools are used, the problems encountered and war stories/learning experiences. The talk introduces the most frequently used parts of our playbook and how SRE endeavours to save your cluster while oncall in an effort to meet our SLOs.