November 16, 2019

161 words 1 min read

Automating GPU Infrastructure for Kubernetes

Automating GPU Infrastructure for Kubernetes

Kubernetes has seen broad interest from the machine learning community and many users are bringing GPUs to their clusters. However, compiling, installing, and updating the NVIDIA kernel modules needed …

Talk Title Automating GPU Infrastructure for Kubernetes
Speakers Lucas Servén Marín (Senior Software Engineer, Red Hat)
Conference KubeCon + CloudNativeCon Europe
Conf Tag
Location Copenhagen, Denmark
Date Apr 30-May 4, 2018
URL Talk Page
Slides Talk Slides
Video

Kubernetes has seen broad interest from the machine learning community and many users are bringing GPUs to their clusters. However, compiling, installing, and updating the NVIDIA kernel modules needed to run workloads on those GPUs continues to be a cumbersome and largely manual process. Furthermore, distributions like Container Linux, which update frequently can require new kernel modules every other week. In this presentation, Lucas Servén explains how to automate all of these operations for Kubernetes deployed on Container Linux and describes his experience running GPU Kubernetes clusters on both AWS and bare metal.

comments powered by Disqus