JupyterHub for domain-focused integrated learning modules


Talk Title	JupyterHub for domain-focused integrated learning modules
Speakers	Mariah Rogers (UC Berkeley Division of Data Sciences), Julian Kudszus (UC Berkeley Division of Data Sciences)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience. In an effort to empower more students with data analysis skills and tools, the modules program seeks out existing courses in the course catalogue and offers instructors a day or two of relief by developing a short, one- to three-class period curriculum that engages directly with their course material. Is the class discussing social inequality? The program develops notebooks investigating the socioeconomic status index or mapping and correlating student collected data versus demographic census data. Reading medieval manuscripts in a literature course? How about some text analysis of Sir Gawain and the Green Knight? Learning about phonological properties of different world languages? Map and correlate these properties on a world map. Discussing political rhetorical strategies? What about trying sentiment analysis on a corpus of political campaign speeches? It’s easy to imagine countless scenarios where a course might benefit from a one- or two-day data-driven perspective. Until recently, the main obstacle to this dream has been the startup costs of computing for students without a technical background. In addition to the learning curve associated with programming, even the process of installing Python and its dependencies for a particular analysis would easily take an entire class period. JupyterHub has an opportunity to fundamentally change traditional pedagogy beyond CS and data science courses. We’ve already seen its utility for full courses, workshops, and tutorials, but Berkeley has begun to realize its potential for seamless integration into the traditional classroom. With no startup cost, students must only click a link and be dropped right into the user-friendly Jupyter Notebook. After a short, targeted introduction to Python that only introduces relevant concepts to the task at hand, students learn more programming concepts from direct application to something they actually care about. More importantly, they are introduced to new perspectives about the phenomenon they are discussing in class. Researchers have long advocated for teaching concepts in a stimulating and relevant environment. JupyterHub allows us to get there in under five minutes. Beyond the impressive numbers of students and courses served, the program is particularly proud of its success within the social sciences, arts, and humanities. Most students in these courses have no experience with programming and minimal (if any) experience with data. Moreover, the modules program directly addresses concerns of historically marginalized groups, particularly as they pertain to data science. For example, in a course studying stigma and prejudice, the modules program empowers students to statistically uncover implicit bias in our society. Modules have become a low-stakes opportunity for students to discover data-driven, inferential thinking by trying to answer a question that interests them.

JupyterHub for domain-focused integrated learning modules

nbinteract: Shareable interactive web pages from notebooks

Pangeo: Big data climate science in the cloud

Sea change: What happens when Jupyter becomes pervasive at a university?

Reproducible data dependencies for Jupyter: Distributing massive, versioned image datasets from the Allen Institute for Cell Science

SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing

Deep learning 101: Apache MXNet