February 5, 2020

230 words 2 mins read

Machine Learning Models and Datasets Versioning Practices and Tools

Machine Learning Models and Datasets Versioning Practices and Tools

The rise of AI and ML changes development workflow and requires new development tools: data versioning, ML pipeline versioning, experiments metrics tracking and others that have not been formalized an …

Talk Title Machine Learning Models and Datasets Versioning Practices and Tools
Speakers Dmitry Petrov (Co-Founder & CEO, DVC), Ruslan Kuprieiev (Software Engineer, Iterative AI)
Conference Open Source Summit + ELC Europe
Conf Tag
Location Lyon, France
Date Oct 27-Nov 1, 2019
URL Talk Page
Slides Talk Slides
Video

The rise of AI and ML changes development workflow and requires new development tools: data versioning, ML pipeline versioning, experiments metrics tracking and others that have not been formalized and even named yet.Machine learning workflow is data-centric in contrast to source code-centric software engineering workflow. The traditional software engineering toolset does not fully cover ML team’s needs. We will discuss the current practices of organizing ML projects using traditional open-source tools like Git and Git-LFS as well as their limitations. Thereby motivation for developing new ML specific data management systems will be explained.Data Version Control or DVC.ORG is an open source, command-line tool. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favorite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects.

comments powered by Disqus