December 19, 2019

366 words 2 mins read

Anatomy of testing in production: A Netflix original case study

Anatomy of testing in production: A Netflix original case study

So you think you can test your complex distributed application effectively just using your test environment? At Netflix, automated testing of client and server applications runs at scale in production. It has quickly gone from low-volume manual mode to automated continuous and voluminous mode. Vasanth Asokan offers a study of such testing at scale that will inform your overall testing strategy.

Talk Title Anatomy of testing in production: A Netflix original case study
Speakers Vasanth Asokan (Netflix)
Conference O’Reilly Software Architecture Conference
Conf Tag Engineering the Future of Software
Location New York, New York
Date February 4-6, 2019
URL Talk Page
Slides Talk Slides
Video

So you want to test your complex application that involves large-scale distributed systems. But how do you feel about testing it effectively just using your test environment? Today, automated testing of Netflix client and server applications runs at scale in production. Within a few years, the company’s testing has gone from a low-volume manual mode to one where it is continuous, voluminous, and fully automated. Collectively, Netflix teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios, to the point where service providers are more wary of getting paged for causing instability to internal testers than for causing an external outage. Vasanth Asokan offers a study of the evolution and anatomy of production testing at scale at Netflix, explaining why there was a desire to test in production, what Netflix did to try to keep testing out of production, and where testing belongs, anyway. Along the way, Vasanth shares a few case studies to demonstrate both the benefits and the less tangible diffused impacts of concentrated, uncoordinated testing against customer-facing infrastructure. Vasanth also looks at other forms of testing, such as load, failure, and simulation testing, and explains the role they play in ensuring a fully functioning customer experience. Join in to learn whether the benefits outweigh the risks of executing untested code in production or whether it’s better to focus on creating a production mirror. If you run large-scale distributed systems, this talk will better inform your overall testing strategy, illustrate specific techniques that work at scale, and provide trade-offs to consider.

comments powered by Disqus