October 6, 2019

223 words 2 mins read

Understanding Scalability and Performance in the Kubernetes Master

Understanding Scalability and Performance in the Kubernetes Master

Currently, the scale limit of Kubernetes is 5k nodes, so if you want to use it to manage a web-scale cluster like 10k nodes, you probably can't make it. Have you wondered what is the performance bott …


Talk Title	Understanding Scalability and Performance in the Kubernetes Master
Speakers	Fansong Zeng (Staff Engineer, Alibaba), Xingyu Chen (software engineer, Alibaba)
Conference	KubeCon + CloudNativeCon
Conf Tag
Location	Shanghai, China
Date	Jun 23-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

Currently, the scale limit of Kubernetes is 5k nodes, so if you want to use it to manage a web-scale cluster like 10k nodes, you probably can’t make it. Have you wondered what is the performance bottleneck for Kubernetes to manage more than 5k nodes? When you want to expand its scalability to a new level, who’s to “blame” first? Etcd, apiserver, or scheduler? Understanding these questions is the key to operate a large-size kubernetes cluster. In Alibaba, we encountered many issues like pod creation gets extremely slower as the cluster grows to larger and larger. In this talk, we would like to share how we did various benchmark tests and profiling. And how we did tweaks/tunings on the master and achieved more than 100x performance improvement in the master. Currently, operating a 10K-node kubernetes cluster is just as smooth as a 2k-node one.

api alibaba kubernetes performance cluster

comments powered by Disqus

Dynamic Pod Resource Boundary Adjustment in Web Scale Clusters

Dynamic Pod Resource Boundary Adjustment in Web Scale Clusters

October 1, 2019

Have you ever confused about how to set perfect resource limit for Pod? How do you balance resource efficiency with application's SLO? In this talk, we will share practices and lessons learned from a …

Build Serverless with K8s, Kata Containers and Bare Mental Cloud in Alibaba

Build Serverless with K8s, Kata Containers and Bare Mental Cloud in Alibaba

October 4, 2019

Serverless is hot! Everybody knows that. While not so many people know that in Serverless platform, applications from different tenants have to be co-located on the same node which is the key of why S …

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster

October 4, 2019

Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule …

Istio Performance and Best Practices in Large Scale Kubernetes Cluster

Istio Performance and Best Practices in Large Scale Kubernetes Cluster

October 4, 2019

As many industry cloud solutions and frameworks are adopting Istio since its GA in 2018, it is important to understand its performance in large scale Kubernetes cluster (2000+ nodes). In this session, …

Intro + Deep Dive: Azure SIG

Intro + Deep Dive: Azure SIG

October 2, 2019

In the SIG Azure Intro and Deep Dive, were going to tell you all about why SIG Azure exists and the team behind managing it. From there, well talk about whats happened over the last few releases, K …

Large Scale Distributed Deep Learning on Kubernetes Clusters

Large Scale Distributed Deep Learning on Kubernetes Clusters

October 2, 2019

The focus of this talk is the deployments of large scale distributed deep learning with Kubernetes. The usage of operators to manage and automate training processes for machine learning are discussed. …