Integrating deep learning libraries with Apache Spark
Joseph Bradley and Xiangrui Meng share best practices for integrating popular deep learning libraries with Apache Spark, covering cluster setup, data ingest, configuring clusters, and monitoring jobs. Joseph and Xiangrui then demonstrate these techniques using Googles TensorFlow library.
Talk Title | Integrating deep learning libraries with Apache Spark |
Speakers | Joseph Bradley (Databricks), Xiangrui Meng (Databricks) |
Conference | O’Reilly Artificial Intelligence Conference |
Conf Tag | Put AI to Work |
Location | New York, New York |
Date | June 27-29, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
The combination of deep learning with Apache Spark has the potential to make a huge impact. Joseph Bradley and Xiangrui Meng share best practices for integrating popular deep learning libraries with Apache Spark. Rather than comparing deep learning systems or specific optimizations, Joseph and Xiangrui focus on issues that are common to many deep learning frameworks when running on a Spark cluster, such as optimizing cluster setup and data ingest (clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker), configuring the cluster (setting up pipelines for efficient data ingest improves job throughput), and monitoring long-running jobs (interactive monitoring facilitates both the work of configuration and checking the stability of deep learning jobs). Joseph and Xiangrui then demonstrate the techniques using Google’s popular TensorFlow library.