February 5, 2020

230 words 2 mins read

Machine Learning Models and Datasets Versioning Practices and Tools

Machine Learning Models and Datasets Versioning Practices and Tools

The rise of AI and ML changes development workflow and requires new development tools: data versioning, ML pipeline versioning, experiments metrics tracking and others that have not been formalized an …


Talk Title	Machine Learning Models and Datasets Versioning Practices and Tools
Speakers	Dmitry Petrov (Co-Founder & CEO, DVC), Ruslan Kuprieiev (Software Engineer, Iterative AI)
Conference	Open Source Summit + ELC Europe
Conf Tag
Location	Lyon, France
Date	Oct 27-Nov 1, 2019
URL	Talk Page
Slides	Talk Slides
Video

The rise of AI and ML changes development workflow and requires new development tools: data versioning, ML pipeline versioning, experiments metrics tracking and others that have not been formalized and even named yet.Machine learning workflow is data-centric in contrast to source code-centric software engineering workflow. The traditional software engineering toolset does not fully cover ML team’s needs. We will discuss the current practices of organizing ML projects using traditional open-source tools like Git and Git-LFS as well as their limitations. Thereby motivation for developing new ML specific data management systems will be explained.Data Version Control or DVC.ORG is an open source, command-line tool. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favorite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects.

open-source metrics code management dataset ml git open source ai tracking machine learning cloud pipeline

comments powered by Disqus

From inception to insight: Accelerating AI productivity with GPUs (sponsored by Dell Technologies)

From inception to insight: Accelerating AI productivity with GPUs (sponsored by Dell Technologies)

February 5, 2020

Data scientists and machine learning engineers need the flexibility to work in multiple environments without wasting precious time configuring hardware and software and modifying code. Ramesh Radhakrishnan and John Zedlewski walk you through deploying a simple set of technologies for executing end-to-end pipelines entirely on GPUs.

TFX: Production ML pipelines with TensorFlow

TFX: Production ML pipelines with TensorFlow

February 2, 2020

Putting together an ML production pipeline for training, deploying, and maintaining ML and deep learning applications is much more than just training a model. Robert Crowe explores Google's open source community TensorFlow Extended (TFX), an open source version of the tools and libraries that Google uses internally, made using its years of experience in developing production ML pipelines.

Overview of Data Governance

Overview of Data Governance

January 26, 2020

Paco Nathan offers an overview of its history, themes, tools, process, standards, and morepartly based on interviewing experts in this field about issues and best practices. Join in to learn what impact machine learning has on data governance and vice versa, along with an overview of open source projects and open standards in this space.

Executive Briefing: Overview of data governance

Executive Briefing: Overview of data governance

January 11, 2020

Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa.

Executive Briefing: Overview of data governance

Executive Briefing: Overview of data governance

December 31, 2019

Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa.

Open source tools for machine learning model and dataset versioning

Open source tools for machine learning model and dataset versioning

December 29, 2019

ML model and dataset versioning is an essential first step in the direction of establishing a good process. Dmitry Petrov and Ivan Shcheklein explore open source tools for ML models and datasets versioning, from traditional Git to tools like Git-LFS and Git-annex and the ML project-specific tool Data Version Control or DVC.org.