Pod Anomaly Detection and Eviction using Prometheus Metrics
Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit b …
Talk Title | Pod Anomaly Detection and Eviction using Prometheus Metrics |
Speakers | David Benque (Senior Software Engineer, Amadeus), Cedric Lamoriniere (Software Engineer, Amadeus) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Copenhagen, Denmark |
Date | Apr 30-May 4, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit breaker, the CNCF projects provide us multiple means of containing this kind of problem.After a quick review of all these means, understanding in which cases they can be used and their limitations, we will see how to react to problems that can only be revealed by internal application KPIs.Maybe you would have liked to use a service mesh circuit breaker feature, however your traffic is not HTTP based; or, one pod continues to reply with HTTP code 200 alongside incorrect functional content. And yet, you have functional indicators that that could help take an immediate and orchestrated operational response.We will see how to cover such cases thanks to dedicated controllers and Prometheus.