September 28, 2019

208 words 1 min read

No More Chaos: Audit and Inspect Kubernetes at Scale

No More Chaos: Audit and Inspect Kubernetes at Scale

Accuracy in fault detection and efficiency of issue analysis are important for availability and stability of Kubernetes clusters.While there are huge number of resources, events, and metrics in Kubern …

Talk Title No More Chaos: Audit and Inspect Kubernetes at Scale
Speakers 陈杰 (技术专家, 阿里云), 马金晶 (高级开发工程师, 蚂蚁金服)
Conference KubeCon + CloudNativeCon
Conf Tag
Location Shanghai, China
Date Jun 23-26, 2019
URL Talk Page
Slides Talk Slides
Video

Accuracy in fault detection and efficiency of issue analysis are important for availability and stability of Kubernetes clusters.While there are huge number of resources, events, and metrics in Kubernetes. In our cluster, we noticed Kubernetes generates thousands of metrics data per second which makes it challenging to figure out the root cause from this ocean of data, not to mention analysis,data visualizion and alarms.In this talk, we will share experince and practices of auditing and inspecting Kubernetes at web scale. We’ll firstly talk about the how we design data metrics to reflect the stability of Kubernetes and how we consume these metrics and set out streaming alarm.We will use real cases to demo how we aggregate and analyze these metrics data.Finally,we will share the practices in Alibaba of building a automiatic system for real-time data inspection and analysis for Kubernetes.

comments powered by Disqus