September 28, 2019

223 words 2 mins read

Minimizing GPU Cost for Your Deep Learning on Kubernetes

Minimizing GPU Cost for Your Deep Learning on Kubernetes

More and more data scientists run their Nvidia GPU based deep learning tasks on Kubernetes. Meanwhile, it's found over 40% cost are wasted on idle GPU in the cluster. So one important challenge is how …

Talk Title Minimizing GPU Cost for Your Deep Learning on Kubernetes
Speakers Kai Zhang (Staff Engineer, Alibaba), Yang Che (Senior Engineer, Alibaba)
Conference KubeCon + CloudNativeCon
Conf Tag
Location Shanghai, China
Date Jun 23-26, 2019
URL Talk Page
Slides Talk Slides
Video

More and more data scientists run their Nvidia GPU based deep learning tasks on Kubernetes. Meanwhile, it’s found over 40% cost are wasted on idle GPU in the cluster. So one important challenge is how Kubernetes can help to improve GPU usage efficiency. In this talk we will introduce a GPU sharing solution on native Kubernetes. All design and implementation details will be discussed. Key topics include, - How to define GPU sharing API - How to make GPU sharing can be scheduled in the Kubernetes cluster without changing scheduler bare bone code. - How to integrate GPU isolation solution with Kubernetes A demo will be shown to illustrate how Tensorflow users to run different jobs on the same GPU device in Kubernetes cluster. In practise of the solution , overall GPU usage gets remarkable improvement, especially for AI model develop, debug and inference services.

comments powered by Disqus