November 25, 2019

216 words 2 mins read

Kubernetizing Big Data and ML Workloads at Uber

Kubernetizing Big Data and ML Workloads at Uber

Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by eff …


Talk Title	Kubernetizing Big Data and ML Workloads at Uber
Speakers	Min Cai (Sr. Staff Engineer, Uber), Mayank Bansal (Staff Engineer, Uber)
Conference	KubeCon + CloudNativeCon North America
Conf Tag
Location	San Diego, CA, USA
Date	Nov 15-21, 2019
URL	Talk Page
Slides	Talk Slides
Video

Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by efficient use of cluster resources, Uber is planning to use Kubernetes to co-locate BigData/ML and micro-service workloads.Kubernetes is the de-facto standard for running micro-services. However, in comparison to YARN, it still lacks many features like hierarchical resource pools, elastic resource sharing, gang scheduling etc. To bridge this gap, we have re-architected Peloton to be a set of Kubernetes scheduler and controller plugins so that we can provide feature parity with YARN.This talk will cover:- Learnings of running large-scale BigData/ML on Kubernetes with Peloton- Colocation of mixed workloads- Federation across zones- Feature and API parity with YARN

api cluster spark bigdata ml large-scale colocation uber big data kubernetes

comments powered by Disqus

Latest Kubernetes Scalability Improvements

Latest Kubernetes Scalability Improvements

October 17, 2019

As the kubernetes project evolved, it started to increasingly gain adoption by enterprise and large scale users. Kubernetes, with a series of performance and scalability improvements, had come to supp …

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

October 4, 2019

Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule …

How Kubernetes Components Communicate Securely in Your Cluster

How Kubernetes Components Communicate Securely in Your Cluster

November 25, 2019

How do your cluster components talk to each other?In this expository talk, we'll first cover the main Kubernetes components that need trusted communication - that is, the API server, kubelet, and et …

Tutorial: Zero to Operator in 90 Minutes!

Tutorial: Zero to Operator in 90 Minutes!

November 25, 2019

Please bring your laptop fully charged as we will have limited charging stations available in the room.Please complete the following steps ahead of time to make your tutorial easier: https://gist.gith …

Running High-performance User-space Packet Processing Apps in Kubernetes

Running High-performance User-space Packet Processing Apps in Kubernetes

November 24, 2019

With 5G on the horizon, networking is transforming around us. Network functions have already found their way from proprietary blackbox into servers running in Linux. The Linux networking stack simply …

SIG Cluster Lifecycle (Cluster API)

SIG Cluster Lifecycle (Cluster API)

November 24, 2019

The Cluster Lifecycle SIG is the Special Interest Group that is responsible for building the user experience for deploying and upgrading Kubernetes clusters. Our mission is examining how we should cha …