January 9, 2020

234 words 2 mins read

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks

Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data.


Talk Title	Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks
Speakers	Matt Brems (General Assembly)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

If you work with data, you’ve almost certainly encountered missing data. The most common approaches are to either ignore or drop anything that’s missing, but this can lead to really bad results. Matt Brems identifies the three types of missing data, explains how bad dropping or ignoring missing data can be, and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. Matt focuses on the following techniques: no imputation, deductive imputation, mean, median, and mode imputation, regression imputation, stochastic imputation, and multiply stochastic imputation. You’ll come away with a solid, intuitive understanding of how to handle missing data, practical tips for implementing these techniques, and recommendations for integrating them with your or your company’s workflow.

book jupyter data science

comments powered by Disqus

Citizen data science: An enterprise use case from inside the US intelligence community

Citizen data science: An enterprise use case from inside the US intelligence community

January 9, 2020

Dave Stuart explains how Jupyter was used inside the US Department of Defense and the greater intelligence community to empower thousands of "citizen data scientists" to build and share analytics in order to meet the communitys dynamic challenges.

I don't like notebooks.

I don't like notebooks.

January 8, 2020

I have been using and teaching Python for many years. I wrote a best-selling book about learning data science. And here's my confession: I don't like notebooks. (There are dozens of us!) I'll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do to win me over.

Jupyter notebooks and the intersection of data science and data engineering

Jupyter notebooks and the intersection of data science and data engineering

January 7, 2020

David Schaaf explains how data science and data engineering can work together in cross-functional teamswith Jupyter notebooks at the center of collaboration and the analytic workflowto more effectively and more quickly deliver results to decision makers.

JupyterHub for domain-focused integrated learning modules

JupyterHub for domain-focused integrated learning modules

January 7, 2020

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

nbinteract: Shareable interactive web pages from notebooks

nbinteract: Shareable interactive web pages from notebooks

January 7, 2020

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. Its built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. Sam Lau and Caleb Siu offer an overview of nbinteract and walk you through the steps to publish an interactive web page from a Jupyter notebook.

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter

January 6, 2020

Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.