November 30, 2019

575 words 3 mins read

Accelerating deep learning on Apache Spark using BigDL with coarse-grained scheduling

Accelerating deep learning on Apache Spark using BigDL with coarse-grained scheduling

The BigDL framework scales deep learning for large datasets using Apache Spark. However there is significant scheduling overhead from Spark when running BigDL at large scale. Shivaram Venkataraman and Sergey Ermolin outline a new parameter manager implementation that along with coarse-grained scheduling can provide significant speedups for deep learning models like Inception and VGG.

Talk Title Accelerating deep learning on Apache Spark using BigDL with coarse-grained scheduling
Speakers Sergey Ermolin (Intel), Shivaram Venkataraman (Microsoft Research)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

In recent years, deep learning has significantly improved several AI applications, including recommendation engines, voice/speech recognition, and image/video recognition. BigDL, an open source distributed deep learning framework developed by Intel and built for big data platforms using Apache Spark, brings native support of deep learning functionalities to Spark. In order to scale beyond a cluster, gradient aggregation has to span across servers on a network. In regular reduce or aggregate functions in Apache Spark (and the original MapReduce), all partitions have to send their computed local gradients to the driver machine, and that machine spends linear time on the number of partitions (due to the CPU cost in merging partial results and the network bandwidth limit). This process becomes a bottleneck when there are many partitions. TreeReduce and treeAggregate in Apache Spark are new aggregation communication patterns based on multilevel aggregation trees. They can reduce the load the driver has to deal with but still don’t allow for near-linear scaling. Besides, as the number of partitions grows, there is additional overhead from having a centralized driver that schedules all the tasks in the system. Especially with a large number of partitions, this overhead cannot be ignored and results in decreased throughput and increased training time. Shivaram Venkataraman and Sergey Ermolin outline a new parameter manager implementation that along with coarse-grained scheduling can provide significant speedups for deep learning models like Inception and VGG. To allow for near-linear scaling, they use the new AllReduce operation, a part of the parameter manager in BigDL. The AllReduce operation works in two phases: First, gradients from all the partitions within a single worker are aggregated locally. The gradient is then sliced into chunks, and chunks are exchanged between all the nodes in the cluster. Now each node has its own partition of aggregated gradients and computes its own partition of weights, thus ensuring scalability. In the end, each node ends up with the same updated model weights, but the driver overhead is eliminated. In this way, even as the number of partitions grows, data transferred on each worker is still proportional to the size of the gradients (or weights), and the driver is not involved in the communication. To address the scheduling overhead, they use Drizzle, a recently proposed scheduling framework for Apache Spark. Currently, Spark uses a BSP computation model and notifies the scheduler at the end of each task, which adds overheads and results in decreased throughput and increased latency. Drizzle uses group scheduling, where multiple iterations (or a group) of computation are scheduled at once, helping decouple the granularity of task execution from scheduling and amortize the costs of task serialization and launch. Shivaram and Sergey share results from using the AllReduce operation and Drizzle on a number of common deep learning models, including VGG, GoogLeNet, and Inception, using benchmarks run on Amazon EC2 and Google DataProc.

comments powered by Disqus