Launching products at massive scale: The DevOps way

Everything changes at scale. Launching products at a scale of 1+ billion users requires a massive cross-team, cross-functional, coordinated effort, and business, engineering, and cultural challenges must be overcome. Kishore Jalleda and Gopal Mor explain how they have applied DevOps best practices at scale to successfully launch several high-profile products at Yahoo.


Talk Title	Launching products at massive scale: The DevOps way
Speakers
Conference	Velocity
Conf Tag	Build resilient systems at scale
Location	Amsterdam, The Netherlands
Date	November 7-9, 2016
URL	Talk Page
Slides	Talk Slides
Video

Over the past year or so, Yahoo has been working hard on redesigning some of its most popular products, such as Yahoo.com, Yahoo News, Yahoo Finance, and Yahoo Sports. As you can imagine, this is not a trivial task. These products, which have all been around for more than 10 years, are close to people’s hearts and part of the daily habits of hundreds of millions of people around the world. There are the obvious business challenges to overcome—Yahoo runs tons of experiments to make sure KPIs, engagement metrics, revenue, etc. are all good—as well as the engineering challenges of launching products at such massive scale, which include ensuring Yahoo has sufficient capacity for breaking news events like Prince’s death or the Chinese stock market crash last year and enough capacity to handle a 10x increase in RPS to its APIs serving the Stocks native app after an iOS release. Yahoo must also be prepared for special events like the first-ever NFL livestream event on Yahoo.com, watched live by tens of millions of users, or the first-ever livestream of Berkshire Hathaway’s shareholder meeting on Yahoo Finance—not to mention the inevitable cultural challenges faced by most large companies where microcultures can exist within different business units and subteams. So how does Yahoo succeed? Well, it’s no secret. Yahoo solves most operability problems and challenges using software; it follows a set of rigorous practices and procedures aimed at operational readiness; and it has developed a performance and operability culture within the company. At a high level, Yahoo makes sure that every product and application it builds is reliable, performs within the SLA, is fault tolerant, degrades gracefully, is highly distributed, highly decoupled, secure, and properly instrumented, and scales horizontally. The company also makes sure that most common error conditions in production are auto-remediated and that applications are continuously fault injected to test their fault tolerance and resilience. Hundreds of thousands of automated tests are run in Yahoo’s build and deploy pipelines on every commit as code is continuously deployed (no humans allowed). Of course, all products are continuously monitored and alerted upon when issues arise (alerts are routed to the teams who can actually fix the problem). The overarching goal is product availability and performance. Yahoo obsesses over providing a great end-user experience and has a maniacal focus on performance—metrics like above-fold time (AFT) and time to first byte (TTFB) are constantly talked about in tech council meetings and also constantly monitored. Teams also use clever tactics like speculative retries (upon noticing failures) at the application level to reduce long tail, falling back to a cached version of a module on a page (module-level fallbacks) after an SLA miss; and failing safely to a cached version of a full page after an SLA miss (or timeout with the downstream)—all of which are helpful in providing an uninterrupted browsing experience even when there are errors/faults within the system. Want to learn more? Join Kishore Jalleda and Gopal Mor as they explain how they have applied DevOps best practices at scale to successfully launch several high-profile products at Yahoo in the recent past.

Launching products at massive scale: The DevOps way

How a Spark-based feature store can accelerate big data adoption in financial services

Need for speed: Accelerate automation tests from three hours to three minutes

60,000 tests in six minutes: Create a reliable pipeline, eliminate flaky tests, and deploy safely but quickly

Sell cron, buy Airflow: Modern data pipelines in finance

Auto-updating and automated container application delivery

Performance anomaly detection at scale (sponsored by Salesforce)