November 27, 2019

220 words 2 mins read

Pod Anomaly Detection and Eviction using Prometheus Metrics

Pod Anomaly Detection and Eviction using Prometheus Metrics

Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit b …

Talk Title Pod Anomaly Detection and Eviction using Prometheus Metrics
Speakers David Benque (Senior Software Engineer, Amadeus), Cedric Lamoriniere (Software Engineer, Amadeus)
Conference KubeCon + CloudNativeCon Europe
Conf Tag
Location Copenhagen, Denmark
Date Apr 30-May 4, 2018
URL Talk Page
Slides Talk Slides
Video

Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit breaker, the CNCF projects provide us multiple means of containing this kind of problem.After a quick review of all these means, understanding in which cases they can be used and their limitations, we will see how to react to problems that can only be revealed by internal application KPIs.Maybe you would have liked to use a service mesh circuit breaker feature, however your traffic is not HTTP based; or, one pod continues to reply with HTTP code 200 alongside incorrect functional content. And yet, you have functional indicators that that could help take an immediate and orchestrated operational response.We will see how to cover such cases thanks to dedicated controllers and Prometheus.

comments powered by Disqus