How LinkedIn determines the capacity limits of its services using live traffic
Susie Xia and Anant Rao explain how LinkedIn leverages live production traffic to determine service and resource bottlenecks at scale with a tool called Redliner and how you can use your current architecture to do the same.
Talk Title | How LinkedIn determines the capacity limits of its services using live traffic |
Speakers | Susie Xia (LinkedIn), anant Rao (LinkedIn) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Build resilient systems at scale |
Location | New York, New York |
Date | October 2-4, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Modern web services like LinkedIn are made up of hundreds of microservices running in geographically distributed data centers. Each microservice needs to be wisely allocated capacity to use data center resources efficiently. However, it’s challenging to accurately determine the service capacity limits and provide resource allocation guidance for rapidly growing web services like LinkedIn due to the constantly changing traffic shape, the heterogeneous infrastructure characteristics, and the evolving bottlenecks. Susie Xia and Anant Rao explain how LinkedIn achieves automated capacity measurement and headroom analysis at scale via a system called Redliner, which runs load tests by shifting live user traffic to target service instances in real production environments, helping reduce data center costs, execute proactive capacity planning, and detect performance regressions in development cycles. Susie and Anant also share lessons learned in building and maintaining Redliner and tips on how you can use your current service-oriented architecture to do the same. Topics include: