Distributed TensorFlow on Hops
Fabio Buso offers demonstrations of frameworks for building distributed TensorFlow applications on the Hops platform and walks you through the whole model lifecycle, from debugging and visualizing models on TensorBoard to parallel experimentation and distributed training (with the help of Spark) to model deployment and inferencing using TensorFlow Serving and Kubernetes.
Talk Title | Distributed TensorFlow on Hops |
Speakers | Fabio Buso (Logical Clocks AB) |
Conference | O’Reilly Open Source Convention |
Conf Tag | Put open source to work |
Location | Portland, Oregon |
Date | July 16-19, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Methods that scale with computation are the future of AI. Hyperscale AI companies produce the most accurate models and train their models faster with distributed deep learning. Fabio Buso shares the latest developments in distributed TensorFlow and shows how distribution can both massively reduce training time and enable parallel experimentation for hyperparameter optimization. You’ll explore different distributed architectures for TensorFlow, including the parameter server and “ring allreduce” models, with a focus on open source TensorFlow frameworks that leverage Apache Spark to manage distributed training, such as Yahoo’s TensorFlowOnSpark, Uber’s Horovod, and the Hops model. Fabio also covers the different programming models supported and highlights the importance of cluster support for managing GPUs as a resource. To this end, he demonstrates how Hops, an open source distribution of Hadoop with support for GPUs as a resource, can run TensorFlow applications from a Jupyter notebook using Apache Spark for distribution and walks you through an end-to-end demo for distributed TensorFlow from training to model deployment and inferencing using TensorFlow serving, using a well-known large machine learning dataset (9M images, a 1 TB extended version of ImageNet). The demo will cover important issues of how to debug, monitor, and visualize training with TensorBoard and how to deploy and use trained models for inferencing on Kubernetes.