Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop. This process enhances the feedback loop between people and machines, and the end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation.


Talk Title	Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale
Speakers	Paco Nathan (derwen.ai)
Conference	JupyterCon in New York 2017
Conf Tag
Location	New York, New York
Date	August 23-25, 2017
URL	Talk Page
Slides	Talk Slides
Video

A contemporary theme in artificial intelligence work is designing human-in-the-loop systems: while largely automated, these systems allow people to examine, adjust, and improve what the machines accomplish. Semisupervised learning is difficult: while people can curate training sets for ML systems, it becomes expensive at scale. Adding more unlabeled data does not replace requirements for human guidance and oversight of automated systems. Moreover, it’s quite difficult to anticipate edge cases that will be encountered at scale, especially when live data comes from a large, diverse audience. On the one hand, how do people manage AI systems by interacting with them? On the other hand, how do we manage people who are managing AI systems? If machine learning pipelines running at scale write to log files, then troubleshooting issues in those pipelines can become a machine learning/big data problem itself. Peter Norvig recently described this issue from Google’s perspective at the 2016 Artificial Intelligence Conference: to paraphrase, building reliable and robust software is hard even in deterministic domains, but when we move to uncertain domains (e.g., machine learning), robustness becomes even harder as we encounter operations costs, tech debt, etc. Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop (and shares the code used). Jupyter gets used in two ways. First, people responsible for managing ML pipelines use notebooks to set the necessary hyperparameters. In that sense, the notebooks serve in place of configuration scripts. Second, the ML pipelines update those notebooks with telemetry, summary analytics, etc. in lieu of merely sending that data out to log files. Analysis is kept contextualized, making it simple for a person to review. This process enhances the feedback loop between people and machines: humans in the loop use Jupyter notebooks to inspect ML pipelines remotely, adjusting them at any point and inserting additional analysis, data visualization, and their notes into the notebooks; the machine component is mostly automated but available interactively for troubleshooting and adjustment. The end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation. (An analogy is how products such as New Relic address the needs for DevOps practices at scale for web apps, except here Jupyter is the frontend for ML pipelines at scale.) This work anticipates collaborative features for Jupyter notebooks, where multiple parties can edit or update the same live notebook. In this case, the multiple parties would include both the ML pipelines and the humans in the loop, collaborating together.

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale

AI within O'Reilly Media

Accelerate analytics and AI innovations with Intel (sponsored by Intel)

Deploy Spark ML TensorFlow AI models from notebooks to hybrid clouds (including GPUs)

Real-time machine learning with Redis, Apache Spark, TensorFlow, and more

Big data, AI, the genome, and everything (sponsored by Microsoft)

Paint the landscape and secure your data center with Apache Spot