From dandelion to tree: Scaling Slack
In 2016, Slack faced a problem: the load on its backend servers had increased by 1,000x. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability.
Talk Title | From dandelion to tree: Scaling Slack |
Speakers | Bing Wei (Slack) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Building and maintaining complex distributed systems |
Location | San Jose, California |
Date | June 12-14, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Communication and collaboration platform Slack has been fortunate to experience exponential user growth since its launch in 2014. Slack was originally designed for small teams, and as the user base grew, the original design decisions didn’t scale with the rapid growth. Some of those powerful initial design decisions later became liabilities as the company had to support hundreds of thousands of users communicating at once. By 2016, Slack faced a problem: the load on its backend servers had increased by 1,000×. Once, a whole team was knocked offline and couldn’t reconnect because they uploaded thousands of emojis, a use case that wasn’t expected. The spike of events caused a wave of client reconnections that cascaded into database failures. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability. Bing also discusses Slack’s ongoing effort to build a generalized publish/subscribe framework and how the company handles data synchronization between clients and backend servers, a solution that should further improve latency and reduce backend cost. She also compares her time at Slack with her experience on the Twitter infrastructure team, detailing how the companies’ approaches differ and what Slack could learn from other web-scale companies.