Sharded and Federated Prometheus Servers to Monitor Distributed Databases
At eBay we have developed a geo-distributed transactional document store called NuData. It is deployed on Kubernetes. The current deployment has thousands of pods across three datacenters, and is moni …
Talk Title | Sharded and Federated Prometheus Servers to Monitor Distributed Databases |
Speakers | Jun Li (Principal Architect, eBay), Viswa Vutharkar (Sr. MTS, eBay) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | Seattle, WA, USA |
Date | Dec 9-14, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
At eBay we have developed a geo-distributed transactional document store called NuData. It is deployed on Kubernetes. The current deployment has thousands of pods across three datacenters, and is monitored by Prometheus. For scalability, our Prometheus cluster has sharded servers to monitor individual infrastructure components and federation servers to retrieve aggregated metrics from sharded servers. For high availability, each sharded/federated server is configured with an active/standby pair over its load-balancer. A routing map ( a time series) is automatically constructed by each shard server and assembled by the federation server, to direct Prometheus queries to the right servers. Today we have over 90 Prometheus servers on two datacenters to collect over 11 M metrics/60 seconds on 400 metrics (and 900 rules), to support health monitoring and performance debugging of NuData.