Trapped by the present: Estimating long-term impact from A/B experiments
When software companies use A/B tests to evaluate product changes and fail to accurately estimate the long-term impact of such experiments, they risk optimizing for the users they have at the expense of the users they want to have. Brian Karfunkel explains how to estimate an experiments impact over time, thus mitigating this risk and giving full credit to experiments targeted at noncore users.
Talk Title | Trapped by the present: Estimating long-term impact from A/B experiments |
Speakers | Brian Karfunkel (Pinterest) |
Conference | Strata Data Conference |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 6-8, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
In many software companies, A/B testing is a central technique for understanding how a product change will affect users. However, experiments are often run for a short time, with the assumption that the effects (or lack of measured effects) will persist over time. The users who are most likely to be in an experiment are those most likely to use the service, so the samples for many A/B tests are biased toward very engaged users. For experiments specifically targeting new or less-engaged users, however, it is both harder to get a sample sufficient to measure effects and to understand how those effects will accumulate over time. For example, when testing out new copy for an email campaign, all users eligible to receive emails would be triggered in immediately; the result of the experiment should give an estimate of the impact for the whole population. On the other hand, testing a new search experience for users who have been dormant for several months can only include users who have been dormant, visited the site, and then performed a search. This set of users may be quite small each week, and it may take many weeks to reach the entire population. Running such an experiment for only a few weeks is unlikely to have the power to detect a reasonably sized effect. Even if an effect is detected, the number of users in the experiment will seem low because it is difficult to estimate the number of users who are not yet in the experiment. Brian Karfunkel details a method for estimating the total population that would be affected by a product change over the course of a quarter, developed at Pinterest. Using data on how many users enter the experiment each day, Pinterest fits models to predict the long-term population that would be affected if it shipped the change and uses that to extrapolate the long-term absolute impact. The tool Pinterest developed to implement this method allows the company to consider many types of experiments on an equal footing, ensuring that it does not just value the many users who come every week but also those who come less regularly.