October 16, 2019

220 words 2 mins read

GPU Sharing for Machine Learning Workload on Kubernetes

GPU Sharing for Machine Learning Workload on Kubernetes

Machine learning is becoming more and more popular in the technology world. The community is beginning to leverage Kubernetes to deploy and manage the machine learning workload. One of the key challe …


Talk Title	GPU Sharing for Machine Learning Workload on Kubernetes
Speakers	(Haining Henry) Zhang (Chief Architect, VMware), Yang Yu (Software Engineer, VMware)
Conference	KubeCon + CloudNativeCon Europe
Conf Tag
Location	Barcelona, Spain
Date	May 19-23, 2019
URL	Talk Page
Slides	Talk Slides
Video

Machine learning is becoming more and more popular in the technology world. The community is beginning to leverage Kubernetes to deploy and manage the machine learning workload. One of the key challenges is to schedule the GPU-intensive workload. The Kubernetes has included GPU support for applications. However, there are some limitations of GPU usage: 1. GPU assignment is exclusive. Containers cannot share GPU resources. 2. A container can request one or more GPUs, but it is not possible to request a fraction of a GPU. This session introduces how to run workload using the GPU in Kubernetes. In addition, an approach will be demonstrated to use virtual GPU (vGPU) technology to enable multiple pods concurrently accessing the same physical GPU. This approach not only increases the utilization of GPU resources, it also allows more GPU workloads to be scheduled on the same physical GPU.

container gpu kubernetes machine learning

comments powered by Disqus

Large Scale Distributed Deep Learning on Kubernetes Clusters

Large Scale Distributed Deep Learning on Kubernetes Clusters

October 2, 2019

The focus of this talk is the deployments of large scale distributed deep learning with Kubernetes. The usage of operators to manage and automate training processes for machine learning are discussed. …

Multi-Cloud Machine Learning Data and Workflow with Kubernetes

Multi-Cloud Machine Learning Data and Workflow with Kubernetes

September 27, 2019

Autonomous vehicles require hardware accelerated machine learning for critical problems such as tracking and classification. Momenta trains ML models in on-prem regions and public clouds, each comes w …

A Method for the Cost Optimization of Kubernetes-based Deep Learning Training and Inference

A Method for the Cost Optimization of Kubernetes-based Deep Learning Training and Inference

September 26, 2019

To improve the throughput capacity of the training or inference applications without adding extra GPU cores, we share one GPU core between multiple deep learning workloads in a kubernetes cluster by c …

Keynote: Tencent: Kubernetes in the Billions

Keynote: Tencent: Kubernetes in the Billions

September 24, 2019

At Tencent, our business touches everything from gaming, social media, payments, to cloud computing. Wed like to share our story of how K8s is broadly used at Tencent, taking care of our infrastructu …

Delivering TV Everywhere with Cloud Native Solutions

Delivering TV Everywhere with Cloud Native Solutions

October 16, 2019

Traditional TV players are facing huge challenges from the rapid growth of emerging video services such as Netflix, Amazon Prime and YouTube TV. TV service providers must modernize and accelerate the …

Intro + Deep Dive: containerd

Intro + Deep Dive: containerd

October 16, 2019

We will show how users can enhance containerd without having to first modify containerds internals. Well also cover building custom snapshotters for special storage needs and integrating with custom …