November 13, 2019

218 words 2 mins read

Detecting outliers and anomalies in real-time at Datadog

Detecting outliers and anomalies in real-time at Datadog

Datadog provides outlier and anomaly detection functionality to automatically alert on metrics that are difficult to monitor using thresholds alone. Homin Lee discusses the algorithms and open source tools Datadog uses and lessons learned from using these alerts on its own systems, along with some real-life examples on how to avoid false positives and negatives.

Talk Title Detecting outliers and anomalies in real-time at Datadog
Speakers Homin Lee (Datadog)
Conference O’Reilly Open Source Convention
Conf Tag
Location Austin, Texas
Date May 16-19, 2016
URL Talk Page
Slides Talk Slides
Video

Monitoring even a modestly sized systems infrastructure quickly becomes untenable without automated alerting. For many metrics, it is nontrivial to define ahead of time what constitutes “normal” versus “abnormal” values. This is especially true for metrics whose baseline value fluctuates over time. To make this problem more tractable, Datadog provides outlier detection functionality to automatically identify any host (or group of hosts) that is behaving abnormally compared to its peers and anomaly detection to alert when any single metric is behaving differently than its past history would suggest. Homin Lee discusses the algorithms and open source tools Datadog uses for outlier and anomaly detection and lessons learned from using these alerts on its own systems, along with some real-life examples on how to avoid false positives and negatives.

comments powered by Disqus