January 17, 2020

269 words 2 mins read

Practical, team-focused operability techniques for distributed systems

Practical, team-focused operability techniques for distributed systems

Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT.

Talk Title Practical, team-focused operability techniques for distributed systems
Speakers Matthew Skelton (Skelton Thatcher Consulting)
Conference O’Reilly Velocity Conference
Conf Tag Build Resilient Distributed Systems
Location London, United Kingdom
Date October 18-20, 2017
URL Talk Page
Slides Talk Slides
Video

Modern software systems now increasingly span cloud and on-premises deployments and remote embedded devices and sensors. These distributed systems bring challenges with data, connectivity, performance, and systems management; to ensure success, you must design and build with operability as a first-class property. Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT: logging as a live diagnostics vector with sparse event IDs; operational checklists and runbook dialog sheets as a discovery mechanism for teams; endpoint health checks as a way to assess runtime dependencies and complexity; correlation IDs beyond simple HTTP calls; and lightweight user personas as drivers for operational dashboards. These techniques work very differently with different technologies. For instance, an IoT device has limited storage, processing, and I/O, so generating and shipping of logs and metrics looks very different from cloud or serverless cases. However, the principles—logging as a live diagnostics vector, event IDs for discovery, etc.—work remarkably well across very different technologies. Drawing from his experience helping teams improve the operability of their software systems, Matthew explains what works (and what doesn’t) and how teams can expand their understanding and awareness of operability through these straightforward, team-friendly techniques.

comments powered by Disqus