Automating GPU Infrastructure for Kubernetes
Kubernetes has seen broad interest from the machine learning community and many users are bringing GPUs to their clusters. However, compiling, installing, and updating the NVIDIA kernel modules needed …
Talk Title | Automating GPU Infrastructure for Kubernetes |
Speakers | Lucas Servén Marín (Senior Software Engineer, Red Hat) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Copenhagen, Denmark |
Date | Apr 30-May 4, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Kubernetes has seen broad interest from the machine learning community and many users are bringing GPUs to their clusters. However, compiling, installing, and updating the NVIDIA kernel modules needed to run workloads on those GPUs continues to be a cumbersome and largely manual process. Furthermore, distributions like Container Linux, which update frequently can require new kernel modules every other week. In this presentation, Lucas Servén explains how to automate all of these operations for Kubernetes deployed on Container Linux and describes his experience running GPU Kubernetes clusters on both AWS and bare metal.