December 7, 2019

419 words 2 mins read

Machine learning platform lifecycle management

Machine learning platform lifecycle management

A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. Hope Wang explains how to manage various artifacts and their associations, automate deployment to support the lifecycle of a model, and build a cohesive machine learning platform.

Talk Title Machine learning platform lifecycle management
Speakers Hope Wang (Intuit)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Data science and machine learning are critical enabling factors for data-driven organizations. There has been an exponential rise of expectations put on engineering organizations to meet the demand to develop and scale machine learning capabilities. A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. This includes data discovery, feature engineering, iterative model development, model training, and model scoring (batch and online). The management of artifacts, their associations, and deployment across various platform components is vital. While there are a number of mature technologies that support each phase of this lifecycle, there are limited solutions available that tie these components together into a cohesive machine learning platform. To support the lifecycle of a model, you must be able to manage the various ML-related artifacts and their associations and automate deployment. A lifecycle management service built for this purpose should be leveraged for storage, versioning, visualizing (including associations), and deployment of artifacts. The platform should support model development in different programming languages, and language and package versions should be configured specific to a model. Having the custom environment follows the model through the lifecycle is important to guarantee model always run in the same environment. Thus, the environment should be externalized, associated, and deployed together with a model. Other considerations include the connection between various artifacts and platforms:, the data and datasets (source data and feature data, training datasets, and scoring result sets), the code (notebook code, model code, deployment code, etc.), model-specific environments, and platforms (developing and training platforms, batch and online scoring platforms). Hope Wang explains how her team at Intuit is managing the machine learning lifecycle, how different components associate and interact with each other, and how to execute in a production environment. Hope then shares an example of how an integrated process was developed for data engineers and data scientists to manage the entire lifecycle of a model from ideation through development, training, and ultimately, scoring.

comments powered by Disqus