February 1, 2020

181 words 1 min read

Building successful site reliability engineering in large enterprises

Building successful site reliability engineering in large enterprises

Implementing site reliability (SRE) engineering doesn't have to be intimidating, and it isn't only for cloud-native organizations. Liz Fong-Jones and Dave Rensin share eight key lessons Google's customer reliability engineering team learned helping large enterprises adopt SRE as an operations engineering model.

Talk Title Building successful site reliability engineering in large enterprises
Speakers Liz Fong-Jones (Honeycomb), Dave Rensin (Google)
Conference O’Reilly Velocity Conference
Conf Tag Building and maintaining complex distributed systems
Location New York, New York
Date October 1-3, 2018
URL Talk Page
Slides Talk Slides
Video Talk Video

Google’s customer reliability engineering team is a specialized group of SREs who go into the world and teach enterprise customers of public cloud infrastructure—via their actual production systems—how to “do SRE” in their orgs. In the team’s two years of existence, its members have found that some things they thought would be hard weren’t, while others were nigh on impossible. The team has written many postmortems and learned a bunch of lessons you can only learn the hard way. Liz Fong-Jones and Dave Rensin share eight of these key lessons. Topics include:

comments powered by Disqus