Observability is the Key Tenet of Running a Multi-Tenant K8s Environment
How do you achieve 200 Million requests per day and no down time for 2 years? T-Mobile has been driving containerized workloads for many internal application teams. Running an internal multi-tenant en …
Talk Title | Observability is the Key Tenet of Running a Multi-Tenant K8s Environment |
Speakers | Amreth Chandrasehar (Principal Architect, Cloud, T-Mobile), Thom McCann (Sr. Manager, Sr Software Engineer, T-Mobile) |
Conference | KubeCon + CloudNativeCon North America |
Conf Tag | |
Location | Seattle, WA, USA |
Date | Dec 9-14, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
How do you achieve 200 Million requests per day and no down time for 2 years? T-Mobile has been driving containerized workloads for many internal application teams. Running an internal multi-tenant environment can be challenging but has significant benefits In this session we’ll dive deep into observability components such as a large scale telemetry system built on Prometheus serving 4000+ requests per second and millions of metrics over 6 clusters. This open source system is built across 3 AZs in each region (US West and US East) and is federated across multiple Prometheus clusters enabling distributed queries and limitless scale. We’ll dive deep into how our operational teams can view integrated monitoring for infrastructure, hosts, VMs, containers and application level and integrate alerts with slack, pager duty and other real-time systems.