Rebuilding the airplane in flight. . .safely
Rewriting the key software component of your platform from scratch is always intimidating. Shannon Weyrick and James Royalty discuss NS1's recent DNS server rewrite and outline the steps the company took to roll it out across its globally distributed network with no downtime.
Talk Title | Rebuilding the airplane in flight. . .safely |
Speakers | Shannon Weyrick (NS1), James Royalty (NS1) |
Conference | O’Reilly Velocity Conference |
Conf Tag | Building and maintaining complex distributed systems |
Location | New York, New York |
Date | October 1-3, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
In 2017, NS1 embarked on a ground-up rewrite of its advanced DNS server software. This required significant research, planning, and execution time—over a year in total. Shannon Weyrick and James Royalty detail the various challenges NS1 encountered and key points the company had to consider to successfully engineer and deploy across its global managed DNS network with no negative impact or downtime to its customers. Shannon and James share background on the decision to move forward with a rewrite (including DNSSEC and scaling requirements), research of appropriate technologies to balance performance, functionality, and engineering velocity, phased milestones with a hybrid release approach to better facilitate product iteration and to gain operational experience, a system for tee-testing traffic for verification of correctness during deploy, and the utilization of an anycast network for fault isolation during roll out. Along the way, they discuss the many minor successes, failures, setbacks, and delays that you may face day to day and offer tips and advice to support you in your own quests to rebuild your airplanes in flight. You’ll leave with an appreciation for the challenges involved in planning and executing a large-scale rewrite and deployment of critical path software across a widely distributed network.