February 24, 2020

219 words 2 mins read

Accelerating training, inference, and ML applications on NVIDIA GPUs

Accelerating training, inference, and ML applications on NVIDIA GPUs

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio give you a sneak peek at software components from NVIDIAs software stack so you can get the best out of your end-to-end AI applications on modern NVIDIA GPUs. They also examine features and tips and tricks to optimize your workloads right from data loading, processing, training, inference, and deployment.

Talk Title Accelerating training, inference, and ML applications on NVIDIA GPUs
Speakers Maggie Zhang (NVIDIA), Nathan Luehr (NVIDIA), Josh Romero (NVIDIA), Pooya Davoodi (NVIDIA), Davide Onofrio (NVIDIA)
Conference O’Reilly TensorFlow World
Conf Tag
Location Santa Clara, California
Date October 28-31, 2019
URL Talk Page
Slides Talk Slides
Video

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio dive into techniques to accelerate deep learning training and inference for common deep learning and machine learning workloads. You’ll learn how DALI can eliminate I/O and data processing bottlenecks in real-world applications and how automatic mixed precision (AMP) can easily give you up to 3x training performance improvement on Volta GPUs. You’ll see best practices for multi-GPU and multinode scaling using Horovod. They use a deep learning profiler to visualize the TensorFlow operations and identify optimization opportunities. And you’ll learn to deploy these trained models using INT8 quantization in TensorRT (TRT), all within new convenient APIs of the TensorFlow framework.

comments powered by Disqus