January 17, 2020

378 words 2 mins read

TuneIn: How to get your jobs tuned while you are sleeping

TuneIn: How to get your jobs tuned while you are sleeping

Have you ever tuned a Spark or MR job? If the answer is yes, you already know how difficult it is to tune more than hundred parameters to optimize the resources used. Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning tool developed to minimize the resource usage of jobs. Experiments have shown up to a 50% reduction in resource usage.

Talk Title TuneIn: How to get your jobs tuned while you are sleeping
Speakers Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Have you ever tuned a Spark, Hive, or Pig job? If the answer is yes, you already know that it is a never-ending cycle that involves executing the job, observing the running job, making sense out of hundreds of metrics, and then rerunning it with the better parameters. Now imagine doing this for tens of thousands of jobs. Manual performance optimization at this scale is both tedious and costly, requires significant domain expertise, and results in a lot of wasted resources. LinkedIn solved this problem by developing Dr. Elephant, an open source self-serve performance monitoring and tuning tool for Hadoop and Spark. While it has proven to be very successful at LinkedIn as well as other companies, it relies on a developer’s initiative to check and apply the recommendations manually. It also expects some expertise from developers to arrive at the optimal configuration from the recommendations. Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning framework developed on top of Dr. Elephant. You’ll learn how LinkedIn uses an iterative optimization approach to find the optimal parameter values, the various optimization algorithms the team tried and why the particle swarm optimization algorithm gave the best results, and how they avoided using any extra execution by tuning the jobs during their regularly scheduled executions. Manoj, Pralabh, and Arpan also share techniques that ensure faster convergence and zero failed executions while tuning, explain how LinkedIn achieved a more than 50% reduction in resource usage by tuning a small set of parameters, and outline lessons learned and a future roadmap for the tool.

comments powered by Disqus