Accelerating training, inference, and ML applications on NVIDIA GPUs
Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio give you a sneak peek at software components from NVIDIAs software stack so you can get the best out of your end-to-end AI applications on modern NVIDIA GPUs. They also examine features and tips and tricks to optimize your workloads right from data loading, processing, training, inference, and deployment.
Talk Title | Accelerating training, inference, and ML applications on NVIDIA GPUs |
Speakers | Maggie Zhang (NVIDIA), Nathan Luehr (NVIDIA), Josh Romero (NVIDIA), Pooya Davoodi (NVIDIA), Davide Onofrio (NVIDIA) |
Conference | O’Reilly TensorFlow World |
Conf Tag | |
Location | Santa Clara, California |
Date | October 28-31, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio dive into techniques to accelerate deep learning training and inference for common deep learning and machine learning workloads. You’ll learn how DALI can eliminate I/O and data processing bottlenecks in real-world applications and how automatic mixed precision (AMP) can easily give you up to 3x training performance improvement on Volta GPUs. You’ll see best practices for multi-GPU and multinode scaling using Horovod. They use a deep learning profiler to visualize the TensorFlow operations and identify optimization opportunities. And you’ll learn to deploy these trained models using INT8 quantization in TensorRT (TRT), all within new convenient APIs of the TensorFlow framework.