Customer-centric observability
With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Mark McBride outlines three key metricsrequest rate, success rate, and the latency histogramthat provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective.
Talk Title | Customer-centric observability |
Speakers | Mark McBride (Turbine Labs) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build resilient systems at scale |
Location | New York, New York |
Date | October 2-4, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
The proliferation of good metrics collection and visualization toolkits over the past five years has been a huge benefit to developers. But with so many metrics available, along with a massive proliferation of services and limited cognitive capacity, which ones should we focus on? Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective. Using concrete examples from a multiyear journey to improve service reliability while scaling a consumer site dramatically, Mark walks you through a customer-centric monitoring approach that fosters better teamwork and faster incident resolution. As your service gets refactored into smaller services, internal teams become customers as well. These three key metrics serve as a common frame of reference for talking about service behavior across teams. Teams can quickly evaluate how their service is behaving for customers and can also quickly evaluate how their dependencies are serving them. This makes communication about performance and reliability issues crisper and dramatically improves incident troubleshooting and resolution.