December 29, 2019

234 words 2 mins read

Open source tools for machine learning model and dataset versioning

Open source tools for machine learning model and dataset versioning

ML model and dataset versioning is an essential first step in the direction of establishing a good process. Dmitry Petrov and Ivan Shcheklein explore open source tools for ML models and datasets versioning, from traditional Git to tools like Git-LFS and Git-annex and the ML project-specific tool Data Version Control or DVC.org.

Talk Title Open source tools for machine learning model and dataset versioning
Speakers Dmitry Petrov (Iterative AI), Ivan Shcheklein (Iterative AI)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location New York, New York
Date April 16-18, 2019
URL Talk Page
Slides Talk Slides
Video

Today, many companies are using machine learning, and ML teams are growing—along with the complexity of ML projects. Establishing a well-defined and manageable process has become a central issue in this environment. ML model and dataset versioning is an essential first step in the direction of establishing a good process. Although source code versioning tools are mature, and the best software engineering practices are well defined, these tools and practices don’t fit well into the ML workflow. ML requires managing models and large dataset files and tightening them along with code for reproducibility where traditional tools like Git work poorly. Dmitry Petrov and Ivan Shcheklein explore open source tools for ML models and datasets versioning, from traditional Git to tools like Git-LFS and Git-annex and the ML project-specific tool Data Version Control or DVC.org.

comments powered by Disqus