January 6, 2020

465 words 3 mins read

SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing

SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing

Bo Peng offers an overview of Script of Scripts (SoS), a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows in situ to analyze large amounts of data on remote systems.

Talk Title SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing
Speakers Bo Peng (The University of Texas, MD Anderson Cancer Center)
Conference JupyterCon in New York 2018
Conf Tag The Official Jupyter Conference
Location New York, New York
Date August 22-24, 2018
URL Talk Page
Slides Talk Slides
Video

Exploratory data analysis in computationally intensive disciplines often necessitates exploiting a variety of tools implemented in different programming languages and analyzing large datasets on high-performance computing systems (e.g., computer clusters). Despite the large number of kernels that Jupyter supports and the availability of magics for executing scripts in other languages, it remains challenging to use Jupyter to develop multilanguage data analysis workflows and streamline the analysis of large amount of data on remote systems. Bo Peng offers an overview of Script of Scripts, a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. As a workflow engine, SoS provides an intuitive syntax for creating workflows in process-based, outcome-oriented (makefile style), and mixed styles, as well as a unified interface for executing and managing tasks on a variety of computing platforms with automatic synchronization of files among isolated filesystems. As a ployglot notebook, SoS allows the use of multiple kernels in a single Jupyter notebook. In addition to magics such as %expand and %capture to compose scripts and capture outputs from all Jupyter kernels, SoS allows exchange of variables among kernels of supported languages. Other useful features of the SoS kernel include a side panel that allows scratch execution of statements, preview of files and expressions, and line-by-line execution of statements in cells. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows to analyze large amounts of data on remote systems. Researchers benefit from the SoS workflow system and Jupyter kernel—they have the flexibility to use their preferred tools for tasks without having to worry about data flow and to perform light interactive analysis while executing heavy remote tasks simultaneous in the same notebook in a neat and organized fashion. SoS is distributed freely under a BSD license. A live Jupyter server and several Docker containers are provided for testing and running SoS easily. The SoS frontend is being ported to JupyterLab with a goal to release it with the release of JupyterLab 1.0.

comments powered by Disqus