Federated Prometheus Monitoring at Scale
In Media Build and Products under Oath, We run 12 production Kubernetes clusters running across our data centers with ~1200 machines with multi-tenant deployments. We monitor our cluster with Promethe …
Talk Title | Federated Prometheus Monitoring at Scale |
Speakers | LungChih Tung (Software Developer II, Oath Inc), Nandhakumar Venkatachalam (Princi Production Engineer, Oath Inc) |
Conference | KubeCon + CloudNativeCon Europe |
Conf Tag | |
Location | Copenhagen, Denmark |
Date | Apr 30-May 4, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
In Media Build and Products under Oath, We run 12 production Kubernetes clusters running across our data centers with ~1200 machines with multi-tenant deployments. We monitor our cluster with Prometheus, each cluster runs a Prometheus instance and overall a single federated cluster with a persistent storage. Total time series is ~17mi (max 5mi /instance) with samples ingestion rate is 300K (max 80K /instance). We have built mind-blowing dashboards at a federated instance like Controller, Scheduler, API server, DNS, Kubelet, Etcd, Utilization overall and per-tenant namespace/ deployment/container gives high visibility. We leverage Alert manager which provides powerful alerting capabilities alerts on call on cluster status, nodes availability, scrape status, fd usage etc.We would like to share our experience of how we monitoring multi-kubernetes cluster with the multi-tenant environment