November 1, 2019

281 words 2 mins read

The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking

The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking

When there is a strong signal in a large dataset, many machine-learning algorithms will find it. On the other hand, when the effect is weak and the data is large, there are many ways to discover an effect that is in fact nothing more than noise. Robert Grossman shares best practices so that you will not be accused of p-hacking.

Talk Title The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking
Speakers Robert Grossman (University of Chicago)
Conference Strata + Hadoop World
Conf Tag Big Data Expo
Location San Jose, California
Date March 14-16, 2017
URL Talk Page
Slides Talk Slides
Video

When there is a strong signal in a large dataset, many machine-learning algorithms will find it. On the other hand, when the effect is weak and the data is large, there are many ways to discover an effect that is in fact nothing more than noise. Robert Grossman shares best practices by exploring three case studies to make it a bit less likely that you will be accused of p-hacking. The first case study concerns mutations in breast cancer and some of the complexities of understanding rare mutations and combinations of rare mutations. In the second case study, Robert dives into different methods for understanding whether there is an effect on the health of newborns when pregnant women are exposed to particulate matter (solid and liquid particles suspended in air). The third case study looks at a well-known published paper offering evidence for ESP. Robert extracts several techniques from these three case studies that have consistently proved useful and discusses how best these techniques can be used in practice. Topics include:

comments powered by Disqus