January 7, 2020

195 words 1 min read

Scaling AI Inference Workloads with GPUs and Kubernetes

Scaling AI Inference Workloads with GPUs and Kubernetes

Deep Learning (DL) is a computational intense form of machine learning that has revolutionize many fields including computer vision, automated speech recognition, natural language processing and artif …

Talk Title Scaling AI Inference Workloads with GPUs and Kubernetes
Speakers Renaud Gaubert (Software Engineer, NVIDIA), Ryan Olson (Solutions Architect, NVIDIA)
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location Seattle, WA, USA
Date Dec 9-14, 2018
URL Talk Page
Slides Talk Slides
Video

Deep Learning (DL) is a computational intense form of machine learning that has revolutionize many fields including computer vision, automated speech recognition, natural language processing and artificial intelligence (AI). DL impacts every vertical market from automotive to healthcare to cloud, as a result, the training and deployment of Deep Neural Networks (DNNs) has shifted datacenter workloads from traditional CPUs to AI-specific accelerators like NVIDIA GPUs. Leveraging several popular CNCF projects such as Prometheus, Envoy, and gRPC, we will demonstrate an implementation of NVIDIA’s reference scale-out inference architecture, capable of delivering petaops per second of performance. This is a new and challenging problem in the datacenter and we will discuss these challenges and ways to optimize for service delivery metrics (latency/throughput), cost, and redundancy.

comments powered by Disqus