Building Telemetry and Anomaly Detection Models for Cloud Native Storage
Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. …
Talk Title | Building Telemetry and Anomaly Detection Models for Cloud Native Storage |
Speakers | Seiya Takei (Storage Engineer, Yahoo Japan Corporation), Xing Yang (Tech Lead, VMware) |
Conference | Open Source Summit + Automotive Linux Summit Japan |
Conf Tag | |
Location | Tokyo, Japan |
Date | Jul 17-19, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. In this session, Takei-san and Xing will describe a common volume metrics model designed to retrieve data from heterogeneous storage in the Cloud Native environment. They will also illustrate a ML module that analyzes the data to detect anomaly, and discuss how it helps Yahoo Japan identify problems early to keep the storage systems healthy. Volume metrics such as IOPs, bandwidth, latency, and capacity are generated from storage backends serving workloads running on Kubernetes, and collected by the Prometheus server. Data is also piped through Kafka, parsed and saved in MongoDB. The ML module retrieves data to train the models, chooses the best model to detect anomalous data points.