December 18, 2019

214 words 2 mins read

Integrating deep learning libraries with Apache Spark

Integrating deep learning libraries with Apache Spark

Joseph Bradley and Xiangrui Meng share best practices for integrating popular deep learning libraries with Apache Spark, covering cluster setup, data ingest, configuring clusters, and monitoring jobs. Joseph and Xiangrui then demonstrate these techniques using Googles TensorFlow library.

Talk Title Integrating deep learning libraries with Apache Spark
Speakers Joseph Bradley (Databricks), Xiangrui Meng (Databricks)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location New York, New York
Date June 27-29, 2017
URL Talk Page
Slides Talk Slides
Video

The combination of deep learning with Apache Spark has the potential to make a huge impact. Joseph Bradley and Xiangrui Meng share best practices for integrating popular deep learning libraries with Apache Spark. Rather than comparing deep learning systems or specific optimizations, Joseph and Xiangrui focus on issues that are common to many deep learning frameworks when running on a Spark cluster, such as optimizing cluster setup and data ingest (clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker), configuring the cluster (setting up pipelines for efficient data ingest improves job throughput), and monitoring long-running jobs (interactive monitoring facilitates both the work of configuration and checking the stability of deep learning jobs). Joseph and Xiangrui then demonstrate the techniques using Google’s popular TensorFlow library.

comments powered by Disqus