Improving Performance of Deep Learning Workloads With Volcano
Baidu internally has improved the performance of large-scale deep learning workloads by using the Volcano project. The CRD-based computing resource model makes it possible to use resources more effici …
Talk Title | Improving Performance of Deep Learning Workloads With Volcano |
Speakers | Ti Zhou (Architect, Baidu) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | San Diego, CA, USA |
Date | Nov 15-21, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Baidu internally has improved the performance of large-scale deep learning workloads by using the Volcano project. The CRD-based computing resource model makes it possible to use resources more efficiently and configure computing models more flexibly. The Volcano project has unified abstraction of the underlying capabilities of group scheduling, fair share, priority queue, job suspend/resume, etc., which makes up for the lack of functionality of the native job based training operator.After using Volcano, Baidu’s internal resource utilization increased by 15%, and the training task completion speed increased by 10%. This talk will introduce the overall function of Volcano, transformation of the old operator to support Volcano, and the comparison of the performance of deep learning training tasks before and after using Volcano.