Increasing visibility of distributed systems in production
Understanding the state of a running application is the key to efficiently troubleshooting production issues and ultimately anticipating outages. Pierre Vincent demonstrates how to make monitoring an integral part of development, using health checks, metrics, tracing, and other patterns to get a clearer picture of applications in production.
Talk Title | Increasing visibility of distributed systems in production |
Speakers | Pierre Vincent (Poppulo) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build Resilient Distributed Systems |
Location | London, United Kingdom |
Date | October 18-20, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Understanding the running state of an application is the key to efficiently troubleshoot production issues and ultimately anticipate outages. When systems grow larger and become distributed, the visibility of application health needs to become a first-class concern; as the likelihood of something going wrong increases, the focus shifts from increasing mean time between failures to reducing mean time to recovery. The best way to achieve this consistently is to make monitoring an integral part of product development, instead of it being just an afterthought. Pierre Vincent demonstrates how to build in monitoring, using health checks, metrics, tracing, and other patterns to get a clearer picture of applications in production. Monitoring can start simple, with basic telemetry such as health checks, which increase visibility in the system’s status. Exposing more advanced metrics can give more details on how the system is working on a system level (e.g., resource usage), application level (e.g., response times), and business level (e.g., completed sales). These health checks and metrics can then be used to trigger alerts when observed values are outside of expected thresholds. Pierre offers an overview of monitoring patterns and tools that will help you build a fuller picture of a running application.