What the reproducibility problem means for your business

Academic research has been plagued by a reproducibility crisis in fields ranging from medicine to psychology. Stuart Buck explains how to take precautions in your data analysis and experiments so as to avoid those reproducibility problems.


Talk Title	What the reproducibility problem means for your business
Speakers	Stuart Buck (Arnold Ventures)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Francisco, California
Date	March 26-28, 2019
URL	Talk Page
Slides	Talk Slides
Video

Over the past five years or so, many scientific and research disciplines have experienced a reproducibility crisis. Just a few examples of the evidence that has accumulated: In 2012, Begley and Ellis reported that when Amgen tried to replicate 53 “landmark” findings from hematology and oncology, they could successfully replicate a mere six studies. Similarly, scientists from Bayer reported that out of 67 key attempts to replicate academic research, only about one-third of the time were results replicable enough to be worth further investment. In 2015, Science published the results of the largest replication project ever performed: the Reproducibility Project in Psychology, in which hundreds of researchers around the world attempted to replicate 100 psychology experiments that had been published in three top psychology journals in recent years. Only about 40% of the findings could be successfully replicated, while the rest were either inconclusive or definitively not replicated. The Reproducibility Project in Cancer Biology set out to replicate the top 50 cancer biology experiments published from 2010 to 2012. Its results so far have been mixed, and most recently, the project had to be scaled back to a mere 18 experiments mostly because it proved to be expensive, time consuming, and difficult to chase down all of the details of the original experiments. And in August 2018, the Social Sciences Replication Project replicated all 21 social science experiments that had been published in Science or Nature from 2010 to 2015. Only 13 of the 21 experiments could be replicated, and even then, the effect size was typically about half of what had originally been published. As practiced in many companies, data science and experimentation can suffer from the same flaws that have created reproducibility problems in everything from medicine to psychology. Indeed, advice from the Harvard Business Review (and elsewhere) can directly lead to inaccurate analyses. (For example, this article recommends that people “slice the data,” which is completely contrary to good practice.) Stuart Buck identifies the most significant sources of problematic data analysis and details the top solutions that other disciplines have used to improve rigor and reproducibility. Business executives and data scientists who prepare to avoid the reproducibility problem will be able to gather better data, draw more informed conclusions, and ultimately make better decisions that improve their strategic positioning in the market.

What the reproducibility problem means for your business

GPU as a Service Over K8s: Drive Productivity and Increase Utilization

(Continuous) threat modeling: What works?

Using the full spectrum of data science to drive business decisions

Building a robust content recommendation platform for 60 million news readers

Content systems architecture: Approaches in a decoupled world

From Issue to PR Merged: A Fluentd Tail