December 12, 2019

319 words 2 mins read

The problem with preaggregated metrics

The problem with preaggregated metrics

Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren't sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics.

Talk Title The problem with preaggregated metrics
Speakers Christine Yen (Honeycomb)
Conference O’Reilly Velocity Conference
Conf Tag Build Resilient Distributed Systems
Location San Jose, California
Date June 20-22, 2017
URL Talk Page
Slides Talk Slides
Video

Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren’t sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Preaggregating a small set of metrics is a perfectly reasonable technique for top-level KPIs but not for the day-to-day operations and debugging work that happens by your engineers on the front lines: it forces your engineers to predict what metrics will be interesting sometime in the future and hobbles their ability to quickly react to unexpected factors. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics and covers the implementation details inherent in building an RRD (the basis of many preaggregated metrics systems), highlighting another axis in which data is constrained. Contiguous time series stored on disk are speedy to read and easy to conceptualize but are at risk for a combinatorial explosion of inputs blowing up the underlying storage. Along the way, Christine stresses the importance of context. Relying on individual metrics and segments is like trying to extrapolate a 3D model of a room from hundreds of one-dimensional data points. When exploring a dataset, it’s crucial to be able to easily understand and visualize the interplay between the various attributes and measurements of a system event, but these one-dimensional metrics rob your engineers of this ability.

comments powered by Disqus