November 27, 2019

190 words 1 min read

Improving Performance of Deep Learning Workloads With Volcano

Improving Performance of Deep Learning Workloads With Volcano

Baidu internally has improved the performance of large-scale deep learning workloads by using the Volcano project. The CRD-based computing resource model makes it possible to use resources more effici …

Talk Title Improving Performance of Deep Learning Workloads With Volcano
Speakers Ti Zhou (Architect, Baidu)
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location San Diego, CA, USA
Date Nov 15-21, 2019
URL Talk Page
Slides Talk Slides
Video

Baidu internally has improved the performance of large-scale deep learning workloads by using the Volcano project. The CRD-based computing resource model makes it possible to use resources more efficiently and configure computing models more flexibly. The Volcano project has unified abstraction of the underlying capabilities of group scheduling, fair share, priority queue, job suspend/resume, etc., which makes up for the lack of functionality of the native job based training operator.After using Volcano, Baidu’s internal resource utilization increased by 15%, and the training task completion speed increased by 10%. This talk will introduce the overall function of Volcano, transformation of the old operator to support Volcano, and the comparison of the performance of deep learning training tasks before and after using Volcano.

comments powered by Disqus