December 9, 2019

221 words 2 mins read

Running Large-Scale Stateful Workloads On Kubernetes at Lyft

Running Large-Scale Stateful Workloads On Kubernetes at Lyft

Along with core services, K8s at Lyft also forms the base to run a large variety of data processing stateful data processing jobs which includes Spark, Flink and other jobs via various ML and Data pro …


Talk Title	Running Large-Scale Stateful Workloads On Kubernetes at Lyft
Speakers	Surinder Singh (Software Engineer, Lyft), Anmol Khurana (Software Engineer, Lyft)
Conference	KubeCon + CloudNativeCon North America
Conf Tag
Location	San Diego, CA, USA
Date	Nov 15-21, 2019
URL	Talk Page
Slides	Talk Slides
Video

Along with core services, K8s at Lyft also forms the base to run a large variety of data processing stateful data processing jobs which includes Spark, Flink and other jobs via various ML and Data processing pipelines.At Lyft, K8s has become the driver for the majority of our data processing needs running 10s of thousands of concurrent jobs. Operating the platform at this scale presents an unique set of challenges which get more complex with highly variable load pattern.In this talk, the speakers will share their journey through some of these challenges and learnings.- Potential pitfalls of running stateful jobs on K8s.- Knobs/tweaks to optimize K8s for stateful jobs.- Running k8s in a cloud environment.- Building a fault-tolerant self-healing system with multiple K8s clusters underneath.Talk will also focus on optimizations done to support the widely used workloads at Lyft.

cluster flink spark ml large-scale k8s optimization cloud pipeline kubernetes

comments powered by Disqus

Flyte: Cloud Native Machine Learning & Data Processing Platform

Flyte: Cloud Native Machine Learning & Data Processing Platform

November 29, 2019

Flyte is the backbone for large-scale Machine Learning and Data Processing (ETL) pipelines at Lyft. It is used across business critical applications ranging from ETA, Pricing, Mapping, Autonomous, etc …

Kubernetizing Big Data and ML Workloads at Uber

Kubernetizing Big Data and ML Workloads at Uber

November 25, 2019

Uber relies on Big Data and ML to make business critical decisions such as pricing, trip ETA, etc. Today, those workloads such as Hive and Spark are running on YARN. To save millions of dollars by eff …

Tutorial: From Notebook to Kubeflow Pipelines: An End-to-End Data Science Workflow

Tutorial: From Notebook to Kubeflow Pipelines: An End-to-End Data Science Workflow

November 22, 2019

Please bring your laptop fully charged as we will have limited charging stations available in the room.This session targets data scientists and ML engineers who want to leverage Kubernetes to scale up …

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

October 4, 2019

Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule …

A Toolkit for Simulating Kubernetes Scheduling at Scale

A Toolkit for Simulating Kubernetes Scheduling at Scale

December 1, 2019

As Kubernetes becomes the de facto standard for container orchestration, new scheduling algorithms and systems are made for different scenarios and workloads. Unfortunately, it is very time and cost c …

Intro to Longhorn: Open Source Cloud-Native Distributed Block Storage Built On and For K8s

Intro to Longhorn: Open Source Cloud-Native Distributed Block Storage Built On and For K8s

November 29, 2019

Longhorn is an Open Source Cloud-Native distributed block storage built on and for Kubernetes. It provides persistent storage support for any Kubernetes cluster with one-click installation. It also s …