January 9, 2020

234 words 2 mins read

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks

Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data.

Talk Title Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks
Speakers Matt Brems (General Assembly)
Conference JupyterCon in New York 2018
Conf Tag The Official Jupyter Conference
Location New York, New York
Date August 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

If you work with data, you’ve almost certainly encountered missing data. The most common approaches are to either ignore or drop anything that’s missing, but this can lead to really bad results. Matt Brems identifies the three types of missing data, explains how bad dropping or ignoring missing data can be, and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. Matt focuses on the following techniques: no imputation, deductive imputation, mean, median, and mode imputation, regression imputation, stochastic imputation, and multiply stochastic imputation. You’ll come away with a solid, intuitive understanding of how to handle missing data, practical tips for implementing these techniques, and recommendations for integrating them with your or your company’s workflow.

comments powered by Disqus