Reproducible science with the Renku platform
Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment.
Talk Title | Reproducible science with the Renku platform |
Speakers | Sandra Savchenko-de Jong (Swiss Data Science Center) |
Conference | JupyterCon in New York 2018 |
Conf Tag | The Official Jupyter Conference |
Location | New York, New York |
Date | August 22-24, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform developed by the Swiss Data Science Centre (a collaboration between ETH Zurich and EFPL) that is designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment. The name was borrowed from the renku, a traditional form of Japanese collaborative poetry. Like its namesake, the platform encourages interdisciplinary cooperation (or competition) between scientists. Renku shows up as a shell around users’ Jupyter notebooks. Under the hood, the platform is governed by a loosely coupled federated model that allows organizations to share compute and storage resources while keeping complete control over said resources. Renku is developed in alignment with the FAIR principles—to make data findable, accessible, interoperable, and reusable. Reusability is enabled by Renku’s knowledge graph. All actions performed on the data and code, whether code execution and access to the storage to read or write new results, are authorized and registered automatically by the Renku middleware into the knowledge graph. The knowledge graph is immutable and contains information about the version of data, code (or notebooks), and the relationships between the two, such as which execution of a notebook generated a version of a dataset and what dataset was used in input. The resulting knowledge graph can be used for governance, intellectual properties attribution, auditing, and data science on data science. The latter would enable new type of services, such as improved search algorithms for data science research and recommender systems to suggest algorithms or datasets to data scientists based on their research activities.