Running Large-Scale Stateful Workloads On Kubernetes at Lyft
Along with core services, K8s at Lyft also forms the base to run a large variety of data processing stateful data processing jobs which includes Spark, Flink and other jobs via various ML and Data pro …
Talk Title | Running Large-Scale Stateful Workloads On Kubernetes at Lyft |
Speakers | Surinder Singh (Software Engineer, Lyft), Anmol Khurana (Software Engineer, Lyft) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | San Diego, CA, USA |
Date | Nov 15-21, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Along with core services, K8s at Lyft also forms the base to run a large variety of data processing stateful data processing jobs which includes Spark, Flink and other jobs via various ML and Data processing pipelines.At Lyft, K8s has become the driver for the majority of our data processing needs running 10s of thousands of concurrent jobs. Operating the platform at this scale presents an unique set of challenges which get more complex with highly variable load pattern.In this talk, the speakers will share their journey through some of these challenges and learnings.- Potential pitfalls of running stateful jobs on K8s.- Knobs/tweaks to optimize K8s for stateful jobs.- Running k8s in a cloud environment.- Building a fault-tolerant self-healing system with multiple K8s clusters underneath.Talk will also focus on optimizations done to support the widely used workloads at Lyft.