January 7, 2020

305 words 2 mins read

Instrumenting systems for arbitrary observability

Instrumenting systems for arbitrary observability

Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? Baron Schwartz outlines the most useful things to know about observability in systems in production.

Talk Title Instrumenting systems for arbitrary observability
Speakers Baron Schwartz (VividCortex)
Conference O’Reilly Velocity Conference
Conf Tag Build resilient systems at scale
Location New York, New York
Date October 2-4, 2017
URL Talk Page
Slides Talk Slides
Video

Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can’t foresee? And how can you make your systems observable? Facing this question at coding time, many programmers try to guess at what’ll be needed later. You can see evidence of this in systems like databases, which provide a lot of “status counters” and the like, most of which were probably included because some developer thought they might be useful. As a result, most end up being vanity metrics. Real production problems never seem to be measured by the things developers thought to include. When you run most systems, the least common denominator for figuring out how it’s working is usually its log, and the log is usually a deluge of useless “I got here” kinds of signals. So what’s a developer to do? Fortunately, experience operating complex systems suggests that there is a universal set of best practices. Baron Schwartz outlines the most useful things to know about observability in systems in production, helping you instrument your systems in ways that support troubleshooting and operate them in production under failure modes you can’t imagine when you’re writing the code.

comments powered by Disqus