December 20, 2019

198 words 1 min read

Scaling Resilient Systems: A Journey into Slack's Database Service

Scaling Resilient Systems: A Journey into Slack's Database Service

Monitoring and observability are important concepts, especially in complex and distributed systems. Redundancy and defensive programming are important as well, but sometimes they are not enough. Desig …


Talk Title	Scaling Resilient Systems: A Journey into Slack's Database Service
Speakers	Guido Iaquinti (Site Reliability Engineer, Freelance), Rafael Chacon (Staff Software Engineer, Slack)
Conference	KubeCon + CloudNativeCon North America
Conf Tag
Location	San Diego, CA, USA
Date	Nov 15-21, 2019
URL	Talk Page
Slides	Talk Slides
Video

Monitoring and observability are important concepts, especially in complex and distributed systems. Redundancy and defensive programming are important as well, but sometimes they are not enough. Designing systems to minimize the blast radius when the unexpected happens is often the key.In this talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second. This journey has taught us how to operate a database cluster with more than 2000 nodes and expecting to growth to more than 3500 in the next 12 months.

database programming distributed system monitoring cluster

comments powered by Disqus

Managing Edge Computing with Serverless

Managing Edge Computing with Serverless

October 22, 2019

Lev Radomislensky will talk about a Kubernetes-based edge solution for retail analytics based on spinning Kubernetes clusters at the edge. The solution relies on a combination of an MQTT broker such a …

Service Mesh: There and Back Again

Service Mesh: There and Back Again

December 20, 2019

You might have heard about service mesh and its amazing benefits. Maybe you believe its the next big thing, but will it truly meet expectations? As any start to a relationship, things look fun and ea …

Five Things You Didnt Know You Could Do with SPIFFE and SPIRE

Five Things You Didnt Know You Could Do with SPIFFE and SPIRE

December 19, 2019

Zero Trust networking and secure authentication are hot topics in security team meetings all over the world. But how do you actually get started? The open-source SPIFFE and SPIRE projects are your fou …

Understanding Kubernetes

Understanding Kubernetes

December 15, 2019

Kubernetes is quickly becoming the preferred way to deploy applications. You may understand Docker, but how can a whole set of containers and services consistently work together and run reliably? Consider Kuberentes a new operating system for your data center. Jonathan Johnson walks you through a series of building blocks to demonstrate how Kubernetes actually works.

Weighing a Cloud: Measuring Your Kubernetes Clusters

Weighing a Cloud: Measuring Your Kubernetes Clusters

December 14, 2019

Kubernetes is complicated. Instrumenting it can be worse. Measuring the components of a distributed system shouldn't be as daunting as being asked to weigh a literal cloud.In this talk, we'll go over …

Performance Tuning and Day 2 Operations

Performance Tuning and Day 2 Operations

December 10, 2019

Cortex is a distributed version of Prometheus with a lot of moving parts. We have a pretty good getting started guide with enough information to get a working cortex cluster that can ingest data and a …