March 4, 2020

222 words 2 mins read

Building Telemetry and Anomaly Detection Models for Cloud Native Storage

Building Telemetry and Anomaly Detection Models for Cloud Native Storage

Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. …

Talk Title Building Telemetry and Anomaly Detection Models for Cloud Native Storage
Speakers Seiya Takei (Storage Engineer, Yahoo Japan Corporation), Xing Yang (Tech Lead, VMware)
Conference Open Source Summit + Automotive Linux Summit Japan
Conf Tag
Location Tokyo, Japan
Date Jul 17-19, 2019
URL Talk Page
Slides Talk Slides
Video

Integrating with heterogeneous storage in the Cloud Native environment has always been a challenge. How to detect problems and fix them in a timely fashion is important for mission critical workloads. In this session, Takei-san and Xing will describe a common volume metrics model designed to retrieve data from heterogeneous storage in the Cloud Native environment. They will also illustrate a ML module that analyzes the data to detect anomaly, and discuss how it helps Yahoo Japan identify problems early to keep the storage systems healthy. Volume metrics such as IOPs, bandwidth, latency, and capacity are generated from storage backends serving workloads running on Kubernetes, and collected by the Prometheus server. Data is also piped through Kafka, parsed and saved in MongoDB. The ML module retrieves data to train the models, chooses the best model to detect anomalous data points.

comments powered by Disqus