Reproducible science with the Renku platform

Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment.


Talk Title	Reproducible science with the Renku platform
Speakers	Sandra Savchenko-de Jong (Swiss Data Science Center)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform developed by the Swiss Data Science Centre (a collaboration between ETH Zurich and EFPL) that is designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment. The name was borrowed from the renku, a traditional form of Japanese collaborative poetry. Like its namesake, the platform encourages interdisciplinary cooperation (or competition) between scientists. Renku shows up as a shell around users’ Jupyter notebooks. Under the hood, the platform is governed by a loosely coupled federated model that allows organizations to share compute and storage resources while keeping complete control over said resources. Renku is developed in alignment with the FAIR principles—to make data findable, accessible, interoperable, and reusable. Reusability is enabled by Renku’s knowledge graph. All actions performed on the data and code, whether code execution and access to the storage to read or write new results, are authorized and registered automatically by the Renku middleware into the knowledge graph. The knowledge graph is immutable and contains information about the version of data, code (or notebooks), and the relationships between the two, such as which execution of a notebook generated a version of a dataset and what dataset was used in input. The resulting knowledge graph can be used for governance, intellectual properties attribution, auditing, and data science on data science. The latter would enable new type of services, such as improved search algorithms for data science research and recommender systems to suggest algorithms or datasets to data scientists based on their research activities.

Reproducible science with the Renku platform

Reproducible data dependencies for Jupyter: Distributing massive, versioned image datasets from the Allen Institute for Cell Science

The reporters notebook

The CIVIC platform: Collaborative data science in the cybernetic ecosystem

Human-in-the-loop data science with Jupyter widgets

Machine learning platform lifecycle management

Improving user-merchant propensity modeling using neural collaborative filtering and wide and deep models on Spark BigDL at scale