Kubernetizing Big Data and ML Workloads at Uber
Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by eff …
Talk Title | Kubernetizing Big Data and ML Workloads at Uber |
Speakers | Min Cai (Sr. Staff Engineer, Uber), Mayank Bansal (Staff Engineer, Uber) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | San Diego, CA, USA |
Date | Nov 15-21, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by efficient use of cluster resources, Uber is planning to use Kubernetes to co-locate BigData/ML and micro-service workloads.Kubernetes is the de-facto standard for running micro-services. However, in comparison to YARN, it still lacks many features like hierarchical resource pools, elastic resource sharing, gang scheduling etc. To bridge this gap, we have re-architected Peloton to be a set of Kubernetes scheduler and controller plugins so that we can provide feature parity with YARN.This talk will cover:- Learnings of running large-scale BigData/ML on Kubernetes with Peloton- Colocation of mixed workloads- Federation across zones- Feature and API parity with YARN