December 22, 2019

306 words 2 mins read

From dandelion to tree: Scaling Slack

From dandelion to tree: Scaling Slack

In 2016, Slack faced a problem: the load on its backend servers had increased by 1,000x. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability.

Talk Title From dandelion to tree: Scaling Slack
Speakers Bing Wei (Slack)
Conference O’Reilly Velocity Conference
Conf Tag Building and maintaining complex distributed systems
Location San Jose, California
Date June 12-14, 2018
URL Talk Page
Slides Talk Slides
Video

Communication and collaboration platform Slack has been fortunate to experience exponential user growth since its launch in 2014. Slack was originally designed for small teams, and as the user base grew, the original design decisions didn’t scale with the rapid growth. Some of those powerful initial design decisions later became liabilities as the company had to support hundreds of thousands of users communicating at once. By 2016, Slack faced a problem: the load on its backend servers had increased by 1,000×. Once, a whole team was knocked offline and couldn’t reconnect because they uploaded thousands of emojis, a use case that wasn’t expected. The spike of events caused a wave of client reconnections that cascaded into database failures. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability. Bing also discusses Slack’s ongoing effort to build a generalized publish/subscribe framework and how the company handles data synchronization between clients and backend servers, a solution that should further improve latency and reduce backend cost. She also compares her time at Slack with her experience on the Twitter infrastructure team, detailing how the companies’ approaches differ and what Slack could learn from other web-scale companies.

comments powered by Disqus