November 27, 2019

220 words 2 mins read

Pod Anomaly Detection and Eviction using Prometheus Metrics

Pod Anomaly Detection and Eviction using Prometheus Metrics

Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit b …


Talk Title	Pod Anomaly Detection and Eviction using Prometheus Metrics
Speakers	David Benque (Senior Software Engineer, Amadeus), Cedric Lamoriniere (Software Engineer, Amadeus)
Conference	KubeCon + CloudNativeCon Europe
Conf Tag
Location	Copenhagen, Denmark
Date	Apr 30-May 4, 2018
URL	Talk Page
Slides	Talk Slides
Video

Dealing with system stability in a distributed and changing environment is a challenge: a single failing pod can affect the majority of your system responses. From Kubernetes probes to Istio circuit breaker, the CNCF projects provide us multiple means of containing this kind of problem.After a quick review of all these means, understanding in which cases they can be used and their limitations, we will see how to react to problems that can only be revealed by internal application KPIs.Maybe you would have liked to use a service mesh circuit breaker feature, however your traffic is not HTTP based; or, one pod continues to reply with HTTP code 200 alongside incorrect functional content. And yet, you have functional indicators that that could help take an immediate and orchestrated operational response.We will see how to cover such cases thanks to dedicated controllers and Prometheus.

metrics code prometheus anomaly detection istio kubernetes service mesh

comments powered by Disqus

Building a Kubernetes Scheduler using Custom Metrics

Building a Kubernetes Scheduler using Custom Metrics

November 24, 2019

The default Kubernetes scheduler does a fantastic job for typical workloads, but when you have specific requirements (like higher level application metrics) you might need other scheduling methods. …

How We Used Jaeger and Prometheus to Deliver Lightning-Fast User Queries

How We Used Jaeger and Prometheus to Deliver Lightning-Fast User Queries

November 26, 2019

This talk comes from practical experience of running a cloud-based SAAS under Kubernetes for the last two years. Prometheus is good for the big picture view of how things are running, while Jaeger act …

Observability and the Depths of Debugging Cloud-Native Applications using Linkerd and Conduit

Observability and the Depths of Debugging Cloud-Native Applications using Linkerd and Conduit

November 23, 2019

Observability and monitoring are different, but complementary, needs for production applications. While monitoring focuses on measuring the overall health of your systems, observability aims to provid …

Why Running kubelet on Your Vacuum Robot Is (Not) a Good Idea

Why Running kubelet on Your Vacuum Robot Is (Not) a Good Idea

November 20, 2019

The Xiamio Mi Vacuum Robot is an affordable bit of kit, yet comes with a powerful SoC and utilises a wide range of sensors. A talk at 34C3 last year showed how to gain root access to the underlying Ub …

Challenges to Writing Cloud Native Applications

Challenges to Writing Cloud Native Applications

November 27, 2019

Cloud native means designing software explicitly for the cloud, not trying to deploy to the cloud in retrospect - shoving a single replica of a monolith into Kubernetes wont cut it. Developing for …

Good Enough for the Finance Industry: Achieving High Security at Scale with Microservices in Kubernetes

Good Enough for the Finance Industry: Achieving High Security at Scale with Microservices in Kubernetes

November 27, 2019

Security is a challenge for most companies. Especially those in periods of rapid growth. It is often taken for granted as we trust the frameworks we use to implement the necessary security protocols f …