A/B testing at Uber: How we built a BYOM (bring your own metrics) platform

Every new launch at Uber is vetted via robust A/B testing. Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze.


Talk Title	A/B testing at Uber: How we built a BYOM (bring your own metrics) platform
Speakers	Milene Darnis (Uber)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 11-13, 2018
URL	Talk Page
Slides	Talk Slides
Video

Most companies tightly couple their experimentation logs (who is seeing which experiment) with the business metrics needed to assess the impact of experiments (like completed trips, retention rate, and cost per trip). By running pipelines precomputing experimentation results for all experiments on a regular cadence (typically once a day), they fulfill basic experimentation needs. This approach is great for companies with a few experiments and who always look at the same set of metrics across all experiments. But what happens when new metrics need to be onboarded? When too many experiments are running at the same time, making the pipelines prone to break? Given the pace at which Uber operates, the metrics needed to assess the impact of experiments constantly evolve. Milene Darnis explains how the team built a scalable and self-serve platform that lets users plug in any metric to analyze. Milene covers architecture choices for the experimentation platform that enable users to self-onboard their experimentation metrics. It all starts from the logs. Uber’s experimentation team relies on Kafka and Spark to ensure they consistently track who is seeing which experiment and when. Milene then explains why logs were decoupled from the metrics needed to analyze experiments and why the team decided to move away from precomputing the same set of metrics for all experiments and built a framework letting people write their own SQL in a templated way. You’ll learn the power of summary tables and how the team used Hive to build “smart” aggregate tables that can be easily joined to any self-onboarded metric, effectively giving users the ability to pick and choose the metrics to analyze for each experiment. Finally, you’ll see how the team leveraged Presto and an async architecture to render 99% of experimentation reports in less than two minutes. Milene concludes with some thoughts on what’s next for the experimentation reporting tool at Uber.

A/B testing at Uber: How we built a BYOM (bring your own metrics) platform

Clouds and containers: Case studies for big data

Marmaray: A generic, scalable, and pluggable Hadoop data ingestion and dispersal framework

Real-time analytics and BI with data lakes and data warehouses using Kudu, HBase, Spark, and Kafka: Lessons learned

Circuit breakers to safeguard for garbage in, garbage out

How to cost-effectively and reliably build infrastructure for machine learning

Distributed systems for stream processing: Apache Kafka and Spark Streaming