December 21, 2019

263 words 2 mins read

Spark adaptive execution: Unleash the power of Spark SQL

Spark adaptive execution: Unleash the power of Spark SQL

Spark SQL is widely used, but it still suffers from stability and performance challenges in highly dynamic environments with large-scale data. Haifeng Chen shares a Spark adaptive execution engine built to address these challenges. It can handle task parallelism, join conversion, and data skew dynamically during runtime, guaranteeing the best plan is chosen using runtime statistics.

Talk Title Spark adaptive execution: Unleash the power of Spark SQL
Speakers Haifeng Chen (Intel)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Francisco, California
Date March 26-28, 2019
URL Talk Page
Slides Talk Slides
Video

Spark SQL—the most popular component of Apache Spark—is widely used to process large-scale structured data in the data center. However, it still suffers from stability and performance challenges in highly dynamic environments with ultra-large-scale data. Haifeng Chen shares a Spark adaptive execution engine built to address these challenges. It can handle task parallelism, join conversion, and data skew dynamically during runtime, guaranteeing the best plan is chosen using runtime statistics, and has provided significant performance improvements in typical SQL benchmarks like TPC-DS. The performance of this approach has been proven by its adoption by a number of Chinese internet companies. Haifeng details the major three challenges the industry faces when using Spark SQL in real-world environments, outlines the technical architecture of the adaptive execution approach as well as the technical details of each solution designed to solve these challenges, and shares benchmark results and experiences from industrial adoptions. Haifeng concludes by discussing planned advances to optimize the Spark Adaptive Execution engine and take it to the next level.

comments powered by Disqus