November 22, 2019

340 words 2 mins read

Metrics-driven tuning of Apache Spark at scale

Metrics-driven tuning of Apache Spark at scale

Spark applications need to be well tuned so that individual applications run quickly and reliably and cluster resources are efficiently utilized. Edwina Lu, Ye Zhou, and Min Shen outline a fast, reliable, and automated process used at LinkedIn for tuning Spark applications, enabling users to quickly identify and fix problems.

Talk Title Metrics-driven tuning of Apache Spark at scale
Speakers Edwina Lu (LinkedIn), Ye Zhou (LinkedIn), Min Shen (LinkedIn)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Tuning Spark can be complex and difficult, since there are many different configuration parameters and metrics. Edwina Lu, Ye Zhou, and Min Shen outline a fast, reliable, and automated process used at LinkedIn for tuning Spark applications, enabling users to quickly identify and fix problems. As the Spark applications running on LinkedIn’s clusters become more diverse and numerous, it’s no longer feasible for a small Spark team to help individual users debug and tune their Spark applications. Users need to be able to get advice quickly and iterate on their development, and any problems need to be caught promptly to keep the cluster healthy. LinkedIn leverages Spark History Server (SHS) to gather application metrics, but as the number of Spark applications and size of individual applications have increased, the SHS has not been able to keep up. It can fall hours behind during peak usage. Edwina, Ye, and Min discuss changes to the SHS to improve efficiency, performance and stability, enabling SHS to analyze a large amount of logs. Another challenge is the lack of proper metrics related to Spark application performance. Edwina, Ye, and Min share new metrics added to Spark that can precisely report resource usage during runtime and explain how these are used in heuristics to identify problems. Based on this analysis, custom recommendations are provided to help users tune their applications. They conclude by detailing the impact made by these tuning recommendations, including improvements in application performance itself and the overall cluster utilization.

comments powered by Disqus