January 5, 2020

288 words 2 mins read

SWAN: CERN's Jupyter-based interactive data analysis service

SWAN: CERN's Jupyter-based interactive data analysis service

SWAN, CERNs service for web-based analysis, leverages the power of Jupyter to provide the high energy physics community access to state-of-the-art infrastructure and services through a web service. Diogo Castro offers an overview of SWAN and explains how researchers and students are using it in their work.


Talk Title	SWAN: CERN's Jupyter-based interactive data analysis service
Speakers	Diogo Castro (CERN)
Conference	JupyterCon in New York 2018
Conf Tag	The Official Jupyter Conference
Location	New York, New York
Date	August 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Both CERN and high energy physics (HEP) in general face unprecedented challenges in data storage, processing, and analysis. The experiments of the Large Hadron Collider (LHC) are expected to reach one exabyte of physics data this year. After processing and filtering this data, interactivity takes particular importance in the last phases of analysis, where the final results are produced, namely in the form of plots. Jupyter’s ability to provide notebooks that merge a rich narrative made of code, text, and other media materials allows CERN to offer a web-based service that addresses the needs of the community. This service, called SWAN (an acronym for service for web-based analysis), provides the HEP community with an interactive interface to access data analysis tools, such as the ROOT framework. Moreover, SWAN integrates with CERN’s infrastructure more precisely, with users’ synchronized storage (CERNBox), computing resources, and experiments data and software. Diogo Castro offers an overview of SWAN and explains how the service is being used by researchers and students, both inside and outside CERN. Diogo also discusses the evolution of the service, especially the new SWAN interface, developed on top of Jupyter, which enables both easy sharing among users and connecting to Spark clusters.

code framework spark infrastructure jupyter book cluster

comments powered by Disqus

Distributed TensorFlow on Hops

Distributed TensorFlow on Hops

December 30, 2019

Fabio Buso offers demonstrations of frameworks for building distributed TensorFlow applications on the Hops platform and walks you through the whole model lifecycle, from debugging and visualizing models on TensorBoard to parallel experimentation and distributed training (with the help of Spark) to model deployment and inferencing using TensorFlow Serving and Kubernetes.

Apache Spark programming

Apache Spark programming

November 29, 2019

Brooke Wenig walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Sparks streaming capabilities and machine learning APIs.

Cuttlefish: Lightweight primitives for online tuning

Cuttlefish: Lightweight primitives for online tuning

November 28, 2019

Tomer Kaftan offers an overview of Cuttlefish, a lightweight framework prototyped in Apache Spark that helps developers adaptively improve the performance of their data processing applications by inserting a few library calls into their code. These calls construct tuning primitives that use reinforcement learning to adaptively modify execution as they observe application performance over time.

Supporting reproducibility in Jupyter through dataflow notebooks

Supporting reproducibility in Jupyter through dataflow notebooks

January 5, 2020

Dataflow notebooks build on the Jupyter Notebook environment by adding constructs to make dependencies between cells explicit and clear. David Koop offers an overview of the Dataflow kernel, shows how it can be used to robustly link cells as a notebook is developed, and demonstrates how that notebook can be reused and extended without impacting its reproducibility.

The future of data-driven discovery in the cloud

The future of data-driven discovery in the cloud

January 5, 2020

Drawing on his experience with the Pangeo project, Ryan Abernathey makes the case for the large-scale migration of scientific data and research to the cloud. The cloud offers a way to make the largest datasets instantly accessible to the most sophisticated computational techniques. A global scientific data commons could usher in a golden age of data-driven discovery.

The reporters notebook

The reporters notebook

January 5, 2020

Beyond Twitter, Facebook, and similar networks, without question, data, code, and algorithms are forming systems of power in our society. Mark Hansen explains why it is crucial that journalistsexplainers of last resortbe able to interrogate these systems, holding power to account.