A Method for the Cost Optimization of Kubernetes-based Deep Learning Training and Inference

To improve the throughput capacity of the training or inference applications without adding extra GPU cores, we share one GPU core between multiple deep learning workloads in a kubernetes cluster by c …
Talk Title | A Method for the Cost Optimization of Kubernetes-based Deep Learning Training and Inference |
Speakers | Lei Wang (Senior Engineer, Tencent Cloud), Pavee Han (Senior Product Manager, Tencent Cloud) |
Conference | KubeCon + CloudNativeCon |
Conf Tag | |
Location | Shanghai, China |
Date | Jun 23-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
To improve the throughput capacity of the training or inference applications without adding extra GPU cores, we share one GPU core between multiple deep learning workloads in a kubernetes cluster by container-level virtual GPU technology. This technology has a better application prospect in the production environments because of its performance loss is lower than virtual-machine-level GPU virtualization.