December 22, 2019

530 words 3 mins read

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop. This process enhances the feedback loop between people and machines, and the end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation.

Talk Title Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale
Speakers Paco Nathan (derwen.ai)
Conference JupyterCon in New York 2017
Conf Tag
Location New York, New York
Date August 23-25, 2017
URL Talk Page
Slides Talk Slides
Video

A contemporary theme in artificial intelligence work is designing human-in-the-loop systems: while largely automated, these systems allow people to examine, adjust, and improve what the machines accomplish. Semisupervised learning is difficult: while people can curate training sets for ML systems, it becomes expensive at scale. Adding more unlabeled data does not replace requirements for human guidance and oversight of automated systems. Moreover, it’s quite difficult to anticipate edge cases that will be encountered at scale, especially when live data comes from a large, diverse audience. On the one hand, how do people manage AI systems by interacting with them? On the other hand, how do we manage people who are managing AI systems? If machine learning pipelines running at scale write to log files, then troubleshooting issues in those pipelines can become a machine learning/big data problem itself. Peter Norvig recently described this issue from Google’s perspective at the 2016 Artificial Intelligence Conference: to paraphrase, building reliable and robust software is hard even in deterministic domains, but when we move to uncertain domains (e.g., machine learning), robustness becomes even harder as we encounter operations costs, tech debt, etc. Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop (and shares the code used). Jupyter gets used in two ways. First, people responsible for managing ML pipelines use notebooks to set the necessary hyperparameters. In that sense, the notebooks serve in place of configuration scripts. Second, the ML pipelines update those notebooks with telemetry, summary analytics, etc. in lieu of merely sending that data out to log files. Analysis is kept contextualized, making it simple for a person to review. This process enhances the feedback loop between people and machines: humans in the loop use Jupyter notebooks to inspect ML pipelines remotely, adjusting them at any point and inserting additional analysis, data visualization, and their notes into the notebooks; the machine component is mostly automated but available interactively for troubleshooting and adjustment. The end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation. (An analogy is how products such as New Relic address the needs for DevOps practices at scale for web apps, except here Jupyter is the frontend for ML pipelines at scale.) This work anticipates collaborative features for Jupyter notebooks, where multiple parties can edit or update the same live notebook. In this case, the multiple parties would include both the ML pipelines and the humans in the loop, collaborating together.

comments powered by Disqus