Talking to the machines: Monitoring production machine learning systems

Ting-Fang Yen details a monitor for production machine learning systems that handle billions of requests daily. The approach discovers detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. You'll see new tools for detecting undesirable model behaviors early in large-scale online ML systems.


Talk Title	Talking to the machines: Monitoring production machine learning systems
Speakers	Ting-Fang Yen (DataVisor)
Conference	O’Reilly Artificial Intelligence Conference
Conf Tag	Put AI to Work
Location	San Jose, California
Date	September 10-12, 2019
URL	Talk Page
Slides	Talk Slides
Video

Production machine learning systems require constant monitoring not just to keep the system online but also to ensure the model inference results are correct. This is much more straightforward when user feedback or labels are available. In those cases, the model performance can be tracked and periodically reevaluated using standard metrics such as precision, recall, or area under the curve (AUC). But labeled data is often lacking. In many applications, labels are expensive to obtain (requiring human analysts’ manual review) or cannot be obtained in a timely manner (e.g., not available until weeks or months later). Ting-Fang Yen describes the design and implementation of a real-time system to monitor production machine learning systems, which is designed to discover detection anomalies, such as volume spikes caused by spurious false positives, as well as gradual concept drifts when the model is no longer able to capture the target concept. In either case, you can automatically detect undesirable model behaviors early. Part of the approach borrows from signal processing techniques for time series decomposition where the time series can be used to represent a sequence of model decisions on different types of input data or the amount of deviation between consecutive model runs. By calculating cross-correlation among the identified anomalies, you can facilitate root cause analysis of the model behavior. This work is a step toward automated deployment of machine learning in production as well as new tools for interpreting model inference results.

Talking to the machines: Monitoring production machine learning systems

Machine learning vital signs: Metrics and monitoring of AI in production

Lightning Talk: An ARM Based System to Monitor Server Farms Using Grafana

Best practices for scaling modeling platforms

Understanding Kubernetes

Measuring and Optimizing Kubeflow Clusters at Lyft

KubeFlows Serverless Component: 10x Faster, a 1/10 of the Effort