September 28, 2019

217 words 2 mins read

Managing Large-Scale Kubernetes Clusters Effectively and Reliably

Managing Large-Scale Kubernetes Clusters Effectively and Reliably

As the business grows, we need to deploy Kubernetets into several data centers all around the world. There are more than ten thousands of Nodes in a single data center. The critical challenge we are f …


Talk Title	Managing Large-Scale Kubernetes Clusters Effectively and Reliably
Speakers	Yong Zhang (Senior Software Engineer, Ant Financial), Zhixian Lin (Senior Software Engineer, Ant Financial)
Conference	KubeCon + CloudNativeCon
Conf Tag
Location	Shanghai, China
Date	Jun 23-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

As the business grows, we need to deploy Kubernetets into several data centers all around the world. There are more than ten thousands of Nodes in a single data center. The critical challenge we are facing is how to manage several large-scale Kubernetes clusters across data centers with efficiency and reliability. In this talk, we will share the experince and practices of automating large-scale cluster management. At first, we will introduce fully automated Node lifecycle management, and how to automatically discover and recover Node failures based on NPD, Autoscalers and customized Operator. Then we will share the experience and solutions of Kubernetes cluster deployment and upgrading. Finally, we will share the risk prevention and control system based on Prometheus and Operator, which is the cornerstone of reliability with the ability of automatic faults detection and isolation.

cluster automating automated management reliability tosca large-scale prometheus data center autoscale kubernetes

comments powered by Disqus

Promoting Kubernetes CI/CD to the Next Level

Promoting Kubernetes CI/CD to the Next Level

September 21, 2019

Many companies and organizations have adopted CI/CD processes in order to help deliver applications running on Kubernetes quickly, transparently, and with automated tests. While this is a desirable go …

Hybrid Cloud and Multi-Cluster Service Connectivity

Hybrid Cloud and Multi-Cluster Service Connectivity

September 26, 2019

Hybrid Cloud is becoming a common deployment these days. When your kubernetes clusters are spread across a mix of on-prem/public clouds, and you want your cluster local services (i.e., non-publicly ac …

Keynote: Tencent: Kubernetes in the Billions

Keynote: Tencent: Kubernetes in the Billions

September 24, 2019

At Tencent, our business touches everything from gaming, social media, payments, to cloud computing. Wed like to share our story of how K8s is broadly used at Tencent, taking care of our infrastructu …

AIOps: Anomaly Detection with Prometheus and Istio

AIOps: Anomaly Detection with Prometheus and Istio

September 22, 2019

As IT operations become more agile and complex, at the same time the need to enhance operational efficiency and intelligence grows. Monitoring applications and kubernetes clusters with Prometheus has …

Porter - An Open Source Load Balancer for Bare Metal Kubernetes

Porter - An Open Source Load Balancer for Bare Metal Kubernetes

September 22, 2019

As we know, the backend workload can be exposed externally using service of type "LoadBalancer" in Kubernetes cluster. Cloud vendors often provide cloud LB plugins for Kubernetes which requires the cl …

Super-Charge Kubernetes App Development Workflow with IDE Extensions

Super-Charge Kubernetes App Development Workflow with IDE Extensions

September 21, 2019

Kubernetes is portable, extensible, and powerful - but getting started and configuration management can be painful. Deploying a simple application on Kubernetes involves multiple configuration files a …