December 5, 2019

351 words 2 mins read

Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am

Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am

Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads.

Talk Title Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am
Speakers Holden Karau (Independent), Rachel Warren (Salesforce Einstein)
Conference Strata Data Conference
Conf Tag Making Data Work
Location London, United Kingdom
Date May 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Apache Spark is an amazing distributed system, but part of the bargain we’ve made with the infrastructure deamons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. Tuning Apache Spark is somewhat of a dark art, although thankfully, when it goes wrong, all we tend to lose is several hours of our day and our employer’s money. Holden Karau, Rachel Warren, and Anya Bida explore auto-tuning jobs using both historical and live job information, using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads. Much of the data required to effectively tune jobs is already collected inside of Spark. You just need to understand it. Holden, Rachel, and Anya outline sample auto-tuners and discuss the options for improving them and applying similar techniques in your own work. They also discuss what kind of tuning can be done statically (e.g., without depending on historic information) and look at Spark’s own built-in components for auto-tuning (currently dynamically scaling cluster size) and how you can improve them. Even if the idea of building an auto-tuner sounds as appealing as using a rusty spoon to debug the JVM on a haunted supercomputer, this talk will give you a better understanding of the knobs available to you to tune your Apache Spark jobs. Also, to be clear, Holden, Rachel, and Anya don’t promise to stop your pager going off at 2:00am, but hopefully this helps.

comments powered by Disqus