Kubeflow: Portable machine learning on Kubernetes (sponsored by Google)
Michelle Casbon offers an overview of Kubeflow. By providing a platform that reduces variability between services and environments, Kubeflow enables applications that are more robust and resilient, resulting in less downtime, quality issues, and customer impact. It also supports the use of specialized hardware such as GPUs, which can reduce operational costs and improve model performance.
Talk Title | Kubeflow: Portable machine learning on Kubernetes (sponsored by Google) |
Speakers | Michelle Casbon (Google) |
Conference | Artificial Intelligence Conference |
Conf Tag | Put AI to Work |
Location | San Francisco, California |
Date | September 5-7, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Practically speaking, some of the biggest challenges facing ML applications are composability, portability, and scalability. The Kubernetes framework is well suited to address these issues, which is why it’s a great foundation for deploying ML products. Michelle Casbon offers an overview of Kubeflow, which is designed to take advantage of these benefits by providing a sustainable, repeatable platform that supports the full lifecycle of an ML application. Kubeflow removes the need for expertise in a large number of areas, reducing the barrier to entry for developing and maintaining ML products. The composability problem is addressed by providing a single, unified tool for running common processes such as data ingestion, transformation, and analysis, model training, evaluation, and serving, as well as monitoring, logging, and other operational tools. The portability problem is resolved by supporting the use of the entire stack either locally, on-premises, or on the cloud platform of your choice. Scalability is native to the kubernetes platform and leveraged by Kubeflow to run all aspects of the product, including resource-intensive model training tasks. By providing a platform that reduces variability between services and environments, Kubeflow enables applications that are more robust and resilient, resulting in less downtime, quality issues, and customer impact. It also supports the use of specialized hardware such as GPUs, which can reduce operational costs and improve model performance. This session is sponsored by Google.