December 30, 2019

301 words 2 mins read

Distributed TensorFlow on Hops

Distributed TensorFlow on Hops

Fabio Buso offers demonstrations of frameworks for building distributed TensorFlow applications on the Hops platform and walks you through the whole model lifecycle, from debugging and visualizing models on TensorBoard to parallel experimentation and distributed training (with the help of Spark) to model deployment and inferencing using TensorFlow Serving and Kubernetes.

Talk Title Distributed TensorFlow on Hops
Speakers Fabio Buso (Logical Clocks AB)
Conference O’Reilly Open Source Convention
Conf Tag Put open source to work
Location Portland, Oregon
Date July 16-19, 2018
URL Talk Page
Slides Talk Slides

Methods that scale with computation are the future of AI. Hyperscale AI companies produce the most accurate models and train their models faster with distributed deep learning. Fabio Buso shares the latest developments in distributed TensorFlow and shows how distribution can both massively reduce training time and enable parallel experimentation for hyperparameter optimization. You’ll explore different distributed architectures for TensorFlow, including the parameter server and “ring allreduce” models, with a focus on open source TensorFlow frameworks that leverage Apache Spark to manage distributed training, such as Yahoo’s TensorFlowOnSpark, Uber’s Horovod, and the Hops model. Fabio also covers the different programming models supported and highlights the importance of cluster support for managing GPUs as a resource. To this end, he demonstrates how Hops, an open source distribution of Hadoop with support for GPUs as a resource, can run TensorFlow applications from a Jupyter notebook using Apache Spark for distribution and walks you through an end-to-end demo for distributed TensorFlow from training to model deployment and inferencing using TensorFlow serving, using a well-known large machine learning dataset (9M images, a 1 TB extended version of ImageNet). The demo will cover important issues of how to debug, monitor, and visualize training with TensorBoard and how to deploy and use trained models for inferencing on Kubernetes.

comments powered by Disqus