November 28, 2019

434 words 3 mins read

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Talk Title Cuttlefish: Lightweight primitives for online tuning
Speakers Tomer Kaftan (University of Washington)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Modern data processing applications execute many diverse operators. For example, a complete machine learning pipeline from label generation to model training may involve regular expressions, relational joins, and image convolutions, each of which has many known implementations for the same functionality. However, their performance can be dramatically affected by the characteristics of things like the input data and the hardware setting, so it can be difficult for developers to choose among the implementations when writing their applications. Traditional database query optimizers and offline autotuners attempt to solve this problem by automatically picking the best operator variants, but they require developers to build optimization rules and cost models or collect representative workloads and profile applications offline. Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These tuners automatically pick operator implementations online and use multi-armed bandit reinforcement learning techniques to quickly learn which operator variants are best for each application. The tuners cyclically try out operator variants during execution so as to balance exploration and exploitation, observe the resulting application performance, and use those observations to influence later decisions. Cuttlefish tuners can incorporate contextual features about the input data when they are available, such as the dimensions of each input image to a convolution operator. They can effectively tune applications in shared-nothing distributed environments even as clusters grow in size. Finally, they can adaptively react to changes in an application’s workload. Cuttlefish was prototyped in Apache Spark, but it can easily be added to other big data systems. To evaluate this prototype, Cuttlefish tuners were used to optimize a wide range of large-scale data processing applications that involve image convolution, regular expression matching, and relational joins. They have achieved 3–6x higher convolution throughput compared to the original unoptimized applications and up to 75x higher regular expression throughput. The tuners have also outperformed Spark SQL’s default query optimizer and sped up relational joins by up to 2.6×.

comments powered by Disqus