Downscaling: The Achilles heel of autoscaling Spark clusters

Autoscaling of resources aims to achieve low latency for a big data application while reducing resource costs. Upscaling a cluster in cloud is fairly easy as compared to downscaling nodes, and so the overall total cost of ownership (TCO) goes up. Prakhar Jain and Sourabh Goyal examine a new design to get efficient downscaling, which helps achieve better resource utilization and lower TCO.


Talk Title	Downscaling: The Achilles heel of autoscaling Spark clusters
Speakers	Prakhar Jain (Microsoft), Sourabh Goyal (Qubole)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 24-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

Adding nodes at runtime (upscaling) to already running Spark on YARN clusters is fairly easy. But taking away these nodes (downscaling) when the workload is low at some later point is difficult. To remove a node from a running cluster, you need to make sure that it isn’t used for compute and storage. But on production workloads, many nodes can’t be taken away because nodes are running some containers, although they are not fully utilized. That means all containers are fragmented on different nodes. For example, each node is running one or two containers or executors, although they have resources to run f containers. Long-running Spark executors makes it even more difficult. Or nodes have some shuffle data in the local disk that will be consumed by a Spark application running on this cluster later. In this case, the resource manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages or tasks. Prakhar Jain and Sourabh Goyal explore how to improve downscaling in Spark on YARN clusters under the presence of such constraints. They cover changes in scheduling strategy for container allocation in the YARN and Spark task scheduler, which together helps achieve better packing of containers. This makes sure that containers are defragmented on fewer sets of nodes and some nodes don’t have any compute. By being careful in how you assign containers in the first place, you can prevent the chance of running into situations where containers of the same application are running over different nodes. They also examine enhancements to the Spark driver and external shuffle service (ESS) which helps you proactively delete shuffle data that you already know has been consumed. This makes sure that nodes are not holding any unnecessary shuffle data—thus freeing them from storage and making them available for reclamation for faster downscaling.

Downscaling: The Achilles heel of autoscaling Spark clusters

Serverless Platform for Large Scale Mini-Apps: From Knative to Production

Container orchestrator to DL workload, Bing's approach: FrameworkLauncher

Developing serverless applications on Kubernetes with Knative (sponsored by Pivotal)

Deep learning with TensorFlow and Spark using GPUs and Docker containers

A Pathway To Multi-Tenancy in Kubernetes

Lightning Talk: Open Match - Matchmaking Framework