January 26, 2020

406 words 2 mins read

A/B testing at Uber: How we built a BYOM (bring your own metrics) platform

A/B testing at Uber: How we built a BYOM (bring your own metrics) platform

Every new launch at Uber is vetted via robust A/B testing. Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze.

Talk Title A/B testing at Uber: How we built a BYOM (bring your own metrics) platform
Speakers Milene Darnis (Uber)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

Most companies tightly couple their experimentation logs (who is seeing which experiment) with the business metrics needed to assess the impact of experiments (like completed trips, retention rate, and cost per trip). By running pipelines precomputing experimentation results for all experiments on a regular cadence (typically once a day), they fulfill basic experimentation needs. This approach is great for companies with a few experiments and who always look at the same set of metrics across all experiments. But what happens when new metrics need to be onboarded? When too many experiments are running at the same time, making the pipelines prone to break? Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze. Milene covers architecture choices for the experimentation platform that enable users to self-onboard their experimentation metrics. It all starts from the logs. Uber’s experimentation team relies on Kafka and Spark to ensure they consistently track who is seeing which experiment and when. Milene then explains why logs were decoupled from the metrics needed to analyze experiments and why the team decided to move away from precomputing the same set of metrics for all experiments and built a framework letting people write their own SQL in a templated way. You’ll learn the power of summary tables and how the team used Hive to build “smart” aggregate tables that can be easily joined to any self-onboarded metric, effectively giving users the ability to pick and choose the metrics to analyze for each experiment. Finally, you’ll see how the team leveraged Presto and an async architecture to render 99% of experimentation reports in less than two minutes. Milene concludes with some thoughts on what’s next for the experimentation reporting tool at Uber.

comments powered by Disqus