September 27, 2019

146 words 1 min read

Multi-Cloud Machine Learning Data and Workflow with Kubernetes

Multi-Cloud Machine Learning Data and Workflow with Kubernetes

Autonomous vehicles require hardware accelerated machine learning for critical problems such as tracking and classification. Momenta trains ML models in on-prem regions and public clouds, each comes w …

Talk Title Multi-Cloud Machine Learning Data and Workflow with Kubernetes
Speakers Fei Xue (Product Manager, Ant Financial), Lei Xue (Infrastructure Tech Lead, Momenta)
Conference KubeCon + CloudNativeCon
Conf Tag
Location Shanghai, China
Date Jun 23-26, 2019
URL Talk Page
Slides Talk Slides
Video

Autonomous vehicles require hardware accelerated machine learning for critical problems such as tracking and classification. Momenta trains ML models in on-prem regions and public clouds, each comes with different GPUs and network interfaces (Infiniband, RoCE). In this talk we discuss how we use Kubernetes to build a multi-cloud ML platform - in particular how we manage training data across different environments; how we address multi-user and gang scheduling; and how we support heterogeneous hardware.

comments powered by Disqus