December 3, 2019

275 words 2 mins read

Availability, latency, and cost: Withstanding regional outages

Availability, latency, and cost: Withstanding regional outages

Multiregion deployments can improve availability and latency and can cost way less than you think. Aaron Blohowiak dives into his experience operating in multiple regions at scale at Netflix and shares the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach.


Talk Title	Availability, latency, and cost: Withstanding regional outages
Speakers	Aaron Blohowiak (Netflix)
Conference	Velocity
Conf Tag	Build resilient systems at scale
Location	New York, New York
Date	September 20-22, 2016
URL	Talk Page
Slides	Talk Slides
Video

Running in multiple regions is better for your users through increased availability and lower latencies, and it won’t cost as much as you think. Netflix has turned region resiliency from a driver of cost and complexity into a strategic advantage by understanding human and system dynamics both at a high-level and in the nitty-gritty details. Calamity, heartbreak, and inefficiency drove the company to refine its approach—and its understanding—as it has matured. Executing a failover used to be an all-hands-on-deck situation that would bring VPs to the table. Now, it’s a matter of routine that usually concludes with a brief “all is well” email. Once you’ve decided to go multiregion, three major questions arise: How many regions do you need? How should you steer users to regions? And how do you actually perform the failover? Aaron Blohowiak dives into his experience operating in multiple regions at scale at Netflix and shares the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach. He also offers an overview of the design considerations and system models Netflix used to make those decisions.

outage code management complexity netflix book incident

comments powered by Disqus

Practical performance tips to make your cross-platform mobile apps faster

Practical performance tips to make your cross-platform mobile apps faster

November 6, 2019

Apache Cordova is one of the most popular frameworks for cross-platform mobile development. To build Cordova apps that perform well, its important to understand how to use the technologies in the most efficient ways. Doris Chen outlines what impacts "native performance," demonstrates how to measure mobile app performance, and shares practical tips for building faster Cordova apps.

Ansible for SRE teams

Ansible for SRE teams

December 3, 2019

Ansible is a "batteries included" automation, configuration management, and orchestration tool that's fast to learn and flexible enough for any architecture. Join James Meickle to get started with Ansible, with an eye toward sustainable development in cloud environments.

Disaster resilience the Waffle House way, from flattops to feature flags and more

Disaster resilience the Waffle House way, from flattops to feature flags and more

December 2, 2019

Waffle House's hurricane disaster plan has everything you could want from an IT disaster plan, including contact trees, failover states, and runbooks on partial operation. Heidi Waterhouse shares lessons about state drawn from the world outside computers and explains how to quantify them using a finite state machine and implement them automatically while you are in a less-than-perfect condition.

A young lady's illustrated primer to technical decision making

A young lady's illustrated primer to technical decision making

November 28, 2019

Charity Majors discusses making better choices with software. Whether you're selecting a new polyglot persistence layer, launching a startup from scratch, or modernizing a mature environment, there have never been more opportunities for chaos. Charity explains when you should use boring technology, when to take a flyer on the bleeding edge, and best practices for making solid technical decisions.

Is your performance analysis approach as cutting edge as your application architecture?

Is your performance analysis approach as cutting edge as your application architecture?

November 26, 2019

To analyze and improve the performance of modern applications, you must abandon outdated approaches and toolsets which are rooted to the physical topology of servers and JVMs. Jon Hodgson discusses a new paradigm to reveal unexpected relationships and hotspots obscured by the elasticity of containers and microservices so that you can find and fix issues with the most overarching business impact.

Petascale genomics

Petascale genomics

November 17, 2019

The advent of next-generation DNA sequencing technologies is revolutionizing life sciences research by routinely generating extremely large datasets. Tom White explains how big data tools developed to handle large-scale Internet data (like Hadoop) help scientists effectively manage this new scale of data and also enable addressing a host of questions that were previously out of reach.