From dandelion to tree: Scaling Slack

In 2016, Slack faced a problem: the load on its backend servers had increased by 1,000x. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability.


Talk Title	From dandelion to tree: Scaling Slack
Speakers	Bing Wei (Slack)
Conference	O’Reilly Velocity Conference
Conf Tag	Building and maintaining complex distributed systems
Location	San Jose, California
Date	June 12-14, 2018
URL	Talk Page
Slides	Talk Slides
Video

Communication and collaboration platform Slack has been fortunate to experience exponential user growth since its launch in 2014. Slack was originally designed for small teams, and as the user base grew, the original design decisions didn’t scale with the rapid growth. Some of those powerful initial design decisions later became liabilities as the company had to support hundreds of thousands of users communicating at once. By 2016, Slack faced a problem: the load on its backend servers had increased by 1,000×. Once, a whole team was knocked offline and couldn’t reconnect because they uploaded thousands of emojis, a use case that wasn’t expected. The spike of events caused a wave of client reconnections that cascaded into database failures. Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability. Bing also discusses Slack’s ongoing effort to build a generalized publish/subscribe framework and how the company handles data synchronization between clients and backend servers, a solution that should further improve latency and reduce backend cost. She also compares her time at Slack with her experience on the Twitter infrastructure team, detailing how the companies’ approaches differ and what Slack could learn from other web-scale companies.

From dandelion to tree: Scaling Slack

Tooling in the age of serverless computing

Deep Dive: SPIFFE

SPIFFE Deep Dive

Cryptoeconomics of Enterprise Blockchains

Demystifying Data-Intensive Systems On Kubernetes - Alena Hall, Microsoft

Gaining efficiency with time series in ELK