Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]


Talk Title	Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]
Speakers	Huizhi Zhao (Software Engineer, Caicloud), Zeyu Zheng (Chief Data Scientist, Caicloud)
Conference	CloudNativeCon + KubeCon Europe
Conf Tag
Location	Berlin Congress Center
Date	Mar 28-30, 2017
URL	Talk Page
Slides	Talk Slides
Video

Big Data and Machine Learning have become extremely hot topics in recent years. Google has announced its AI-centric strategy and released the deep learning toolkit TensorFlow. TensorFlow soon became the most popular open source toolkit for deep learning applications. However, it may take years to train large deep learning models on a single machine without GPU. In order to accelerate the training process, we build a distributed TensorFlow system on Kubernetes which support both CPUs and GPUs. In this presentation, I’d like to share our experiences about how to build this distributed TensorFlow system on Kubernetes. First, I’ll briefly introduce TensorFlow and how TensorFlow supports training model distributedly. However, the original distribution mechanism lacks lots of components such as scheduling, monitoring, life cycle managing and etc. to make it suitable for production usage. In the rest of the presentation, I’ll focus on how to leverage Kubernetes to solve those problem. The solution involves three components. First, I’ll introduce how to schedule TensorFlow jobs in a cluster with both CPUs and GPUs. Then I’ll share our experience in managing the life cycle of a distributed TensorFlow job. Finally, I’ll state our efforts in lowering the bar for using distributed TensorFlow

Building Distributed TensorFlow Using Both GPU and CPU on Kubernetes [I]

Open source AI at AWS and Apache MXNet

BoFs: Data-Aware Scheduling in Kubernetes [I]

Getting To, and Through, Our First Black Friday with Critical Apps on Kubernetes [I]

Life of a Packet [I]

Shifting to Kubernetes on OpenShift

A contextual real-time bidding engine for search engine marketing