SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing
Bo Peng offers an overview of Script of Scripts (SoS), a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows in situ to analyze large amounts of data on remote systems.
Talk Title | SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing |
Speakers | Bo Peng (The University of Texas, MD Anderson Cancer Center) |
Conference | JupyterCon in New York 2018 |
Conf Tag | The Official Jupyter Conference |
Location | New York, New York |
Date | August 22-24, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Exploratory data analysis in computationally intensive disciplines often necessitates exploiting a variety of tools implemented in different programming languages and analyzing large datasets on high-performance computing systems (e.g., computer clusters). Despite the large number of kernels that Jupyter supports and the availability of magics for executing scripts in other languages, it remains challenging to use Jupyter to develop multilanguage data analysis workflows and streamline the analysis of large amount of data on remote systems. Bo Peng offers an overview of Script of Scripts, a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. As a workflow engine, SoS provides an intuitive syntax for creating workflows in process-based, outcome-oriented (makefile style), and mixed styles, as well as a unified interface for executing and managing tasks on a variety of computing platforms with automatic synchronization of files among isolated filesystems. As a ployglot notebook, SoS allows the use of multiple kernels in a single Jupyter notebook. In addition to magics such as %expand and %capture to compose scripts and capture outputs from all Jupyter kernels, SoS allows exchange of variables among kernels of supported languages. Other useful features of the SoS kernel include a side panel that allows scratch execution of statements, preview of files and expressions, and line-by-line execution of statements in cells. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows to analyze large amounts of data on remote systems. Researchers benefit from the SoS workflow system and Jupyter kernel—they have the flexibility to use their preferred tools for tasks without having to worry about data flow and to perform light interactive analysis while executing heavy remote tasks simultaneous in the same notebook in a neat and organized fashion. SoS is distributed freely under a BSD license. A live Jupyter server and several Docker containers are provided for testing and running SoS easily. The SoS frontend is being ported to JupyterLab with a goal to release it with the release of JupyterLab 1.0.