November 25, 2019

216 words 2 mins read

Kubernetizing Big Data and ML Workloads at Uber

Kubernetizing Big Data and ML Workloads at Uber

Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by eff …

Talk Title Kubernetizing Big Data and ML Workloads at Uber
Speakers Min Cai (Sr. Staff Engineer, Uber), Mayank Bansal (Staff Engineer, Uber)
Conference KubeCon + CloudNativeCon North America
Conf Tag
Location San Diego, CA, USA
Date Nov 15-21, 2019
URL Talk Page
Slides Talk Slides
Video

Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by efficient use of cluster resources, Uber is planning to use Kubernetes to co-locate BigData/ML and micro-service workloads.Kubernetes is the de-facto standard for running micro-services. However, in comparison to YARN, it still lacks many features like hierarchical resource pools, elastic resource sharing, gang scheduling etc. To bridge this gap, we have re-architected Peloton to be a set of Kubernetes scheduler and controller plugins so that we can provide feature parity with YARN.This talk will cover:- Learnings of running large-scale BigData/ML on Kubernetes with Peloton- Colocation of mixed workloads- Federation across zones- Feature and API parity with YARN

comments powered by Disqus