How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center
Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).
Talk Title | How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center |
Speakers | Shreyas Cholia (Lawrence Berkeley National Laboratory), Rollin Thomas (Lawrence Berkeley National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory) |
Conference | JupyterCon in New York 2017 |
Conf Tag | |
Location | New York, New York |
Date | August 23-25, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC). Shreyas, Rollin, and Shane explain the motivation behind using Jupyter for supercomputing, describe their implementation strategy and the process behind the development of that strategy, and discuss lessons learned along the way. They also describe alternative configurations for Jupyter on Cori and outline the benefits and drawbacks of each. The baseline setup incorporates a JupyterHub frontend web service running inside a Docker container (for portability and scaling) that manages user authentication and proxies subsequent Jupyter requests to the Cori system. Shreyas, Rollin, and Shane have developed a custom authenticator for JupyterHub called the GSI (Grid Security Infrastructure) Authenticator that allows users to acquire a grid certificate upon login. The service then uses a special spawner they developed (SSH Spawner), which spins up a Jupyter notebook on Cori via SSH using the GSI credentials. Once launched, the Jupyter notebook connects back to the hub over a websocket. The hub then proxies all future user requests to the Cori node via this websocket connection. Users interact with their notebooks running on Cori, launching preinstalled or custom kernels to analyze and visualize their data over a familiar web interface. A suite of SLURM “magic" commands developed at NERSC allows users to submit batch jobs from notebooks. The new authenticator, modified spawner, and magic commands have been contributed back to the open source Jupyter community. As the number of Jupyter users on Cori grows, Shreyas, Rollin, and Shane expect severe resource limitations in a single-node deployment. The architecture they developed allows Jupyter notebooks to be spawned either on the dedicated Jupyter node or on Cori compute nodes directly. The dedicated-node setup provides users with immediate access to Jupyter at NERSC for smaller-scale analytics tasks, while the compute-node alternative provides them with more resources if they are willing to wait a bit in the queue. Launching notebooks on compute nodes is accomplished through the batch queue system using a customized SLURM-based BatchSpawner interface. This capability opens up Cori compute resources through Jupyter, including Cori data features like the burst buffer, and enables interactive analytics and visualization using thousands of cores on datasets that cannot fit into a single node’s memory footprint. Beyond making Cori more accessible to more scientists, Jupyter allows NERSC to deliver interactive software packages and specialized kernels for tasks such as scalable analytics with Spark, real-time volume rendering and visualization with yt, and complex data analysis workflows with dask and ipyparallel. Shreyas, Rollin, and Shane demonstrate these frameworks in action and address specific challenges faced in deploying them on the Cray XC40 system. Making Jupyter work seamlessly on Cori has required collaboration between data architects, systems engineers, security and network specialists, the core Jupyter team, and the extended Jupyter developer community. By documenting their experiences and plans and contributing back their code, Shreyas, Rollin, and Shane hope to promote and facilitate the concept of interactive supercomputing to a broader audience. Indeed, they envision a day when a good fraction of NERSC users rely exclusively on Jupyter or similar frameworks for data analysis and never use a traditional login shell at all.