January 21, 2020

263 words 2 mins read

How I failed to build a runbook automation system and what I learned

How I failed to build a runbook automation system and what I learned

You're going to automate all the things, reduce toil, and make your systems smarter and recover automatically. . .except sometimes you're automating a house of cards built on the back of individual people and a well-meaning solution can fail to address the true problems in the system. Tim Bonci offers a postmortem of a solution that was designed to solve a common operational problem but failed.


Talk Title	How I failed to build a runbook automation system and what I learned
Speakers	Tim Bonci (Vistaprint)
Conference	O’Reilly Velocity Conference
Conf Tag	Building and maintaining complex distributed systems
Location	San Jose, California
Date	June 11-13, 2019
URL	Talk Page
Slides	Talk Slides
Video

Our intentions can be good, the technical ability and time may be there, and we’re going to build the thing to make work easier and more productive, allowing everyone to apply their labor to only the most valuable tasks—yet sometimes it’s still not enough. This is a postmortem of a solution that was designed to solve a common operational problem but failed. Tim Bonci examines the scars and hopefully provides insights into finding and addressing the right problems in the right places that should be broadly useful in building and deploying your own transformational processes and tools. This is particularly relevant to brownfield teams looking for ways to modernize their processes and anyone who struggles with needing humans to change how they work. Tim explains why shifting human processes to computer automation does not always produce the expected results and how treating nonurgent alerts as a work queue is an anti-pattern.

automation book

comments powered by Disqus

Lifecycle of a kubectl Command: Harden Kubernetes Setup with Automation

Lifecycle of a kubectl Command: Harden Kubernetes Setup with Automation

November 2, 2019

We at Booking.com run tens of on-premise multi-tenant Kubernetes clusters at scale. To automate integration with our existing bare-metal infrastructure and for running kubectl auth pipeline, we run an …

Security precognition: A look at chaos engineering in security incident response

Security precognition: A look at chaos engineering in security incident response

January 19, 2020

Chaos engineering allows security incident response teams to proactively experiment on recurring incident patterns to derive new information about underlying factors that were previously unknown. Join Aaron Rinehart to explore the hidden costs of security incidents, learn a new technique for uncovering system weaknesses in systems security, and more.

Home Multimedia and Automation Systems with GStreamer

Home Multimedia and Automation Systems with GStreamer

January 14, 2020

For quite a few years, Jan has been using GStreamer's network synchronisation features at home to build multimedia systems for distributed media playback.This talk, however, will focus on his progress …

Data science transformation: Transforming a traditional wealth manager to a cutting-edge data-driven company

Data science transformation: Transforming a traditional wealth manager to a cutting-edge data-driven company

January 13, 2020

Charlotte Werger outlines the components necessary to transform a traditional wealth manager into a data-driven business, paying special attention to devising and executing a transformation strategy by identifying key business subunits where automation and improved predictive modeling can result in significant gains and synergies.

The Journey of Leading Open Source Engineering Team in China

The Journey of Leading Open Source Engineering Team in China

January 13, 2020

7 years ago, Jocelyn started her journey in open source development world as a software engineering team manager.At beginning, Jocelyn simply assumed that the only difference with open source developm …

DigitalOcean's Use of OSS in a Fully Routed Datacenter

DigitalOcean's Use of OSS in a Fully Routed Datacenter

January 12, 2020

DO is undergoing a major overhaul of its droplet network infrastructure to carry traffic over a layer 3 (routed) network. This allows a more scalable, versatile, and fault tolerant cloud network with …