Machine learning vital signs: Metrics and monitoring of AI in production

Production artificial intelligence systems are interacting with the real world, and it's terrifying that oftentimes nobody has any idea how they're performing on live data. Donald Miner details why you should track your models in production over time, explains how you can implement proper logging and metrics for models, and details metrics you should probably be capturing.


Talk Title	Machine learning vital signs: Metrics and monitoring of AI in production
Speakers	Donald Miner (Miner & Kasch)
Conference	O’Reilly Open Source Software Conference
Conf Tag	Fueling innovative software
Location	Portland, Oregon
Date	July 15-18, 2019
URL	Talk Page
Slides	Talk Slides
Video

Production models are interacting with the real world, and it’s terrifying that oftentimes nobody has any idea how they’re performing on live data. Bias and variance can creep into your models over time, and you should know when that happens. Many data scientists and their organizations are not keeping track of how their models are performing over time. The world changes, often slowly, and most models perform worse as time goes on. Nuances in a changing environment—new language usage, differing shopping habits, a changing political landscape, and many other factors—can unravel models that were once finely tuned. With the AI and ML explosion, some organizations have upwards of hundreds of models running in production every day. Ensuring everything is working well is a huge undertaking, and unfortunately, many organizations are simply ignoring the problem. Donald Miner details the tracking of machine learning models in production to ensure model reliability, consistency, and performance into the future. You’ll come away with insights on three major topics. He covers why you should invest time in monitoring your machine learning models and shares several anecdotes about some of the dangers of not paying attention to how a model’s performance can change over time. You’ll learn which metrics you should be gathering for each model and what they tell you with a list of “vitals,” what value they provide, and how to measure them. Some of the vitals include classification label distribution over time, distribution of regression results, measurement of bias, measurement of variance, change in output from previous models, and changes in accuracy over time. You’ll also get some implementation strategies to keep watch on model drift over time. Many organizations already have data scientists on their team, but Donald explains how many data science approaches apply to model monitoring, how to determine if a model requires attention, and how to productionalize these strategies.

Machine learning vital signs: Metrics and monitoring of AI in production

Continuous intelligence: Moving machine learning into production reliably

Accelerate innovation in the enterprise with distributed ML and DL (sponsored by BlueData)

Turn devices into data scientistsat the edge

Building high-performance text classifiers on a limited labeling budget

Measuring and Optimizing Kubeflow Clusters at Lyft

KubeFlows Serverless Component: 10x Faster, a 1/10 of the Effort