February 24, 2020

219 words 2 mins read

Accelerating training, inference, and ML applications on NVIDIA GPUs

Accelerating training, inference, and ML applications on NVIDIA GPUs

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio give you a sneak peek at software components from NVIDIAs software stack so you can get the best out of your end-to-end AI applications on modern NVIDIA GPUs. They also examine features and tips and tricks to optimize your workloads right from data loading, processing, training, inference, and deployment.


Talk Title	Accelerating training, inference, and ML applications on NVIDIA GPUs
Speakers	Maggie Zhang (NVIDIA), Nathan Luehr (NVIDIA), Josh Romero (NVIDIA), Pooya Davoodi (NVIDIA), Davide Onofrio (NVIDIA)
Conference	O’Reilly TensorFlow World
Conf Tag
Location	Santa Clara, California
Date	October 28-31, 2019
URL	Talk Page
Slides	Talk Slides
Video

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio dive into techniques to accelerate deep learning training and inference for common deep learning and machine learning workloads. You’ll learn how DALI can eliminate I/O and data processing bottlenecks in real-world applications and how automatic mixed precision (AMP) can easily give you up to 3x training performance improvement on Volta GPUs. You’ll see best practices for multi-GPU and multinode scaling using Horovod. They use a deep learning profiler to visualize the TensorFlow operations and identify optimization opportunities. And you’ll learn to deploy these trained models using INT8 quantization in TensorRT (TRT), all within new convenient APIs of the TensorFlow framework.

api framework performance gpu tensorflow ml deep learning machine learning nvidia optimization

comments powered by Disqus

HARP: An efficient and elastic GPU-sharing system

HARP: An efficient and elastic GPU-sharing system

February 23, 2020

Pengfei Fan and Lingling Jin offer an overview of an efficient and elastic GPU-sharing system for users who do research and development with TensorFlow.

Creating smaller, faster, production-worthy mobile machine learning models

Creating smaller, faster, production-worthy mobile machine learning models

February 20, 2020

Getting machine learning models ready for use on device is a major challenge. Drag-and-drop training tools can get you started, but the models they produce arent small enough or fast enough to ship. Jameson Toole walks you through optimization, pruning, and compression techniques to keep app sizes small and inference speeds high.

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

February 16, 2020

Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019.

Deep learning for recommender systems

Deep learning for recommender systems

January 12, 2020

The success of deep learning has reached the realm of structured data in the past few years, where neural networks have been shown to improve the effectiveness and predictability of recommendation engines. Oliver Gindele offers a brief overview of such deep recommender systems and explains how they can be implemented in TensorFlow.

Deploying deep learning models on GPU-enabled Kubernetes clusters

Deploying deep learning models on GPU-enabled Kubernetes clusters

January 1, 2020

Interested in deep learning models and how to deploy them on Kubernetes at production scale? Not sure if you need to use GPUs or CPUs? Mathew Salvaris and Fidan Boylu Uz help you out by providing a step-by-step guide to creating a pretrained deep learning model, packaging it in a Docker container, and deploying as a web service on a Kubernetes cluster.

Large Scale Distributed Deep Learning on Kubernetes Clusters

Large Scale Distributed Deep Learning on Kubernetes Clusters

October 2, 2019

The focus of this talk is the deployments of large scale distributed deep learning with Kubernetes. The usage of operators to manage and automate training processes for machine learning are discussed. …