Scheduled notebooks: A means for manageable and traceable code execution

Using an nteract project, papermill, Matthew Seal walks you through how Netflix uses notebooks to track user jobs and make a simple interface for work submission. Youll get an inside peek at how Netflix is tackling the scheduling problem for a range of users who want easily managed workflows.


Talk Title	Scheduled notebooks: A means for manageable and traceable code execution
Speakers	Matthew Seal (Netflix)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Matthew Seal explores notebooks as a unifying mechanism for developing, tracking, and debugging small units of work that need to be managed and scheduled and demonstrates how papermill, an nteract tool, can be used to execute notebooks as immutable pieces of code. Matthew explains how this tooling makes notebooks a solid choice for templates in scheduled processes and shares how Netflix is using this pattern to colocate tasks written by users ranging from nonprogrammers to professional system maintainers. This technology choice and its application stems from a desire to help solve a fundamental problem found in many large code ecosystems. As development environments grow and expand to include more tools, more languages, and more flexibility, it often becomes increasingly difficult to maintain a few simple interfaces that can take advantage of these systems. The task of executing a piece of code within such an ecosystem changes from a single point of entry to many dissimilar and constrained entry points. Learning each of these can be tedious and is a major barrier to entry for new users. The goal of showing notebooks as traceable units that can be referenced to point-in-time execution is to help alleviate this pain. Matthew details how Netflix targets similar working environments between local development and scheduled tasks without leaving a Jupyter client. When an error occurs in scheduled work, you can debug the problem in the same way you’d debug a local problem. You’ll see some examples of this pattern when pulling failed notebooks from a scheduler and fixing the problems without needing to interact with the intervening technologies.

Scheduled notebooks: A means for manageable and traceable code execution

The reporters notebook

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter

Reproducible data dependencies for Jupyter: Distributing massive, versioned image datasets from the Allen Institute for Cell Science

Reproducible science with the Renku platform

Sea change: What happens when Jupyter becomes pervasive at a university?

SWAN: CERN's Jupyter-based interactive data analysis service