February 14, 2020

414 words 2 mins read

Downscaling: The Achilles heel of autoscaling Spark clusters

Downscaling: The Achilles heel of autoscaling Spark clusters

Autoscaling of resources aims to achieve low latency for a big data application while reducing resource costs. Upscaling a cluster in cloud is fairly easy as compared to downscaling nodes, and so the overall total cost of ownership (TCO) goes up. Prakhar Jain and Sourabh Goyal examine a new design to get efficient downscaling, which helps achieve better resource utilization and lower TCO.

Talk Title Downscaling: The Achilles heel of autoscaling Spark clusters
Speakers Prakhar Jain (Microsoft), Sourabh Goyal (Qubole)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 24-26, 2019
URL Talk Page
Slides Talk Slides
Video

Adding nodes at runtime (upscaling) to already running Spark on YARN clusters is fairly easy. But taking away these nodes (downscaling) when the workload is low at some later point is difficult. To remove a node from a running cluster, you need to make sure that it isn’t used for compute and storage. But on production workloads, many nodes can’t be taken away because nodes are running some containers, although they are not fully utilized. That means all containers are fragmented on different nodes. For example, each node is running one or two containers or executors, although they have resources to run f containers. Long-running Spark executors makes it even more difficult. Or nodes have some shuffle data in the local disk that will be consumed by a Spark application running on this cluster later. In this case, the resource manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages or tasks. Prakhar Jain and Sourabh Goyal explore how to improve downscaling in Spark on YARN clusters under the presence of such constraints. They cover changes in scheduling strategy for container allocation in the YARN and Spark task scheduler, which together helps achieve better packing of containers. This makes sure that containers are defragmented on fewer sets of nodes and some nodes don’t have any compute. By being careful in how you assign containers in the first place, you can prevent the chance of running into situations where containers of the same application are running over different nodes. They also examine enhancements to the Spark driver and external shuffle service (ESS) which helps you proactively delete shuffle data that you already know has been consumed. This makes sure that nodes are not holding any unnecessary shuffle data—thus freeing them from storage and making them available for reclamation for faster downscaling.

comments powered by Disqus