Anomaly Detection for Cloud Native Storage
Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. …
Talk Title | Anomaly Detection for Cloud Native Storage |
Speakers | Seiya Takei (Storage Engineer, Yahoo Japan Corporation), Xing Yang (Tech Lead, VMware) |
Conference | KubeCon + CloudNativeCon |
Conf Tag | |
Location | Shanghai, China |
Date | Jun 23-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. In this session, Takei-san and Xing will describe a common volume metrics model designed to retrieve data from heterogeneous storage in the Cloud Native environment. They will also illustrate a ML module that analyzes the data to detect anomalous behavior, and discuss how it helps Yahoo Japan identify problems early to keep the Cloud Native storage system healthy. Volume metrics such as IOPs, bandwidth, latency, and capacity are collected from storage backends serving workloads running on Kubernetes, and emitted to the Prometheus server. The ML module retrieves data from Prometheus and applies algorithms to do anomaly detection. Results are evaluated and alerts are issued when needed.