November 18, 2019

202 words 1 min read

Federated Prometheus Monitoring at Scale

Federated Prometheus Monitoring at Scale

In Media Build and Products under Oath, We run 12 production Kubernetes clusters running across our data centers with ~1200 machines with multi-tenant deployments. We monitor our cluster with Promethe …

Talk Title Federated Prometheus Monitoring at Scale
Speakers LungChih Tung (Software Developer II, Oath Inc), Nandhakumar Venkatachalam (Princi Production Engineer, Oath Inc)
Conference KubeCon + CloudNativeCon Europe
Conf Tag
Location Copenhagen, Denmark
Date Apr 30-May 4, 2018
URL Talk Page
Slides Talk Slides
Video

In Media Build and Products under Oath, We run 12 production Kubernetes clusters running across our data centers with ~1200 machines with multi-tenant deployments. We monitor our cluster with Prometheus, each cluster runs a Prometheus instance and overall a single federated cluster with a persistent storage. Total time series is ~17mi (max 5mi /instance) with samples ingestion rate is 300K (max 80K /instance). We have built mind-blowing dashboards at a federated instance like Controller, Scheduler, API server, DNS, Kubelet, Etcd, Utilization overall and per-tenant namespace/ deployment/container gives high visibility. We leverage Alert manager which provides powerful alerting capabilities alerts on call on cluster status, nodes availability, scrape status, fd usage etc.We would like to share our experience of how we monitoring multi-kubernetes cluster with the multi-tenant environment

comments powered by Disqus