5. Python and Jupyter Notebooks (svante-ood.mit.edu)

5.1. Running Python on Svante

Running Python on a high-performance computing cluster such as Svante is not as straightforward as most other software. For other software on Svante, the system administators manage, install, upgrade, etc. software on the cluster. Users are only permitted write access in their personal spaces, and thus cannot install software for general usage (nor is users’ /home intended, nor sufficiently large, for software installations). The core Python language itself is rather limited in its scientific capabilities, but a rapidly expanding library of available packages is available to augment Python’s features; this set of packages is intended to be personalized and managed by the user, with added packages stored in the users’ personal directory spaces.

On Svante we have two different methods of using Python: either using a traditional Python installation (module load python, we have several versions available as shown via module avail with current default python/3.9.1), or through a pre-installed ananconda version of Python (module load ananconda3/2020.11). Anaconda is an open-source package management system and environment management system, a bit more involved to use, but likely the better choice for more advanced Python users. Instructions for both are included in this section.

Python code can can be run interactively in a terminal window (or, simply passed a script for execution through a terminal window), or run interactively in a Jupyter Notebook browser window (see Section 5.2).

5.1.1. Terminal-window based Python with pre-installed packages

Both installations of Python on Svante (i.e., module python or anaconda3) include a large collection of pre-installed packages, ready for use with a import command. For example:

[jscience2@svante-login ~]$ ssh fs02
Last login: Fri Aug 20 15:25:23 2021 from 172.16.0.101
[jscience2@fs02 ~]$ module load python
[jscience2@fs02 ~]$ python
Python 3.9.1 (default, Feb  3 2021, 15:39:28)
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.zeros((3, 4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>>

In the example above, we first ssh to a fileserver (here, server fs02), load up a Python module, start up the Python session (alternatively, start the session by typing ipython to get an interactive Python shell), and import package numpy. Alternatively, we could have requested an interactive compute node for our session, using a slurm srun request (see Section 4.5). DO NOT run Python on svante-login. If you have some Python code you want to execute non-interactively, you could write a slurm script, e.g.,

#!/bin/bash
#
#SBATCH -J example_py_script
#SBATCH –n 1
#SBATCH –p fdr
#SBATCH –t 2:00:00

source /etc/profile.d/modules.sh
module load python/3.9.1

python «name of python script»

If you find that a Python package you require is not included in our comprehensive list of pre-installed packages, send an email to jp-admin@techsquare.com, we’ll add it.

5.1.2. Using Python with user-installed packages

5.1.2.1. Adding packages to terminal-based (traditional) Python

To add packages to the terminal-based Python package installed on svante, you have two options. If this is a common package and/or you think others might to benefit from having it widely available, the easiest approach is simply to ask us (at jp-admin@techsquare.com) to add it to Svante’s software library (as noted above). The second option is to install this library into a Python directory in your /home space, which requires the --user option added to the pip install command:

pip install «package name(s)» --user

Or, to install inside a virtual environment:

module add python
python3 -m venv «environment name»
source /home/«username»/«environment name»/bin/activate
pip install «package name(s)» --user
deactivate

5.1.2.2. Using Conda w/package manager to create virtual environments

If using our installed anaconda module to run Python, you can user-install packages into your /home space only through a virtual environment:

module add anaconda3/2020.11
conda create -n «environment name»
source activate «environment name»
conda install jupyterlab
conda install «package name(s)»
conda deactivate

NOTE: When you create the virtual environment above, the instructions state the command to activate the environment is conda activate «environment name». This will not work on svante, because you should never have done a conda init initialization, which effectively interferes with our module environment system. It is necessary in the above steps to install the jupyterlab package into your virtual environment if you want to use ipython or Svante Open OnDemand (https://svante-ood.mit.edu). Also, these steps are only supported using bash, not legacy C shell environments.

5.2. Running Jupyter Notebooks using Svante Open OnDemand

OOD tabs

Svante Open OnDemand (https://svante-ood.mit.edu) is a web-based interface to the Svante cluster. In addition to allowing simple tasks such as file browsing and opening a terminal window, its main feature allows the user to interactively run Python in a Jupyter Notebook (we will consider adding other Jupyter-supported applications to its capabilities, email with requests). Note: svante-ood only works with Anaconda-installed versions of Python.

Only Chrome and Firefox browsers are supported at this time (i.e., Safari does not work with svante-ood).

The functions of the menu pull-down tabs are as follows:

Files: from here you can list directories, with view, edit, and download functionality. Note by default, your /home directory is selected, but from there you can navigate to any of your /home subdirectories. And, we have created soft-links to all users’ file servers spaces (if you are missing access to any your file server spaces, please let us know).

Jobs: similar to the slurm squeue -u command, this will list all your current slurm jobs in the queue (both running and queued jobs). In addition, this list of active jobs includes svante-ood jobs running on file servers.

Clusters: this tab opens a svante-login terminal window shell in your browser.

Compute Node Interactive Apps:

Compute tabs

By filling in this page, SLURM is used to allocate compute node core(s) for running a Jupyter Notebook. You must specify your (intended) walltime, and which partition. Note this requests the whole compute node, which is 24 cores/64GB RAM for FDR and 32/128GB for EDR. In general we prefer you request FDR unless you specifically need the extra RAM or additional processor cores.

Next, specify which module (version) of Anaconda to use. Do pay attention to your choice here; installed packages are different in different modules (e.g., the anaconda3/2020.11_geoschem has multiple packages installed to aid in the analysis of geoschem output). Moreover, if you have installed custom packages into your personal conda space, you need to know into which module these packages were installed.

Below the SLURM-related information, there are three check boxes. The first “custom Anaconda Environment” is to use a virtual environment with Python; after checking this box, the environment name and path is requested. The second box “custom Anaconda install” is for advanced users who prefer to use their own Anaconda installation (i.e., full installation, not just downloaded packages); when checked, this asks for the path to the user-installed Anaconda directory. Note virtual enviroments (first box) are allowed even if this second box is checked. The final check box allows the user to automatically load the previous notebook session, which is saved in the user’s /home directory.

When the top of the page is filled to satisfaction, hit the blue Lauch button at the bottom… this will bring you to the My Interactive Sessions tab. Hit Connect to Jupyter: this directs you to a launcher page where you can start up a Python 3 Jupyter Notebook (most typically, this is why you are using svante-ood!), a Python 3 console window, or any of the other options at the bottom. The directory path in the left column is pointing at your /home space (similar to the Files tab).

If you have a library of Python routines you want to be accessible to your Python session, there is an easy way to do this: in your .bashrc file, include a line:

export PYTHONPATH=/home/«username»/«python code library directory»

Note this line should be included in a bash startup file, even if your user login default is C shell.

R Server Environment:

Documentation forthcoming

Fileserver Interactive Apps:

Fileserver tab

Why run a svante-ood session on a file server? One major advantages: no wall time limit. And, file server usage is not monitored by the scheduler, so file server nodes are always available for use, even if other tasks are running concurrently. Moreover, if you are doing analysis involving heavy file input/output, your script will likely run much faster if the data are accessed locally, in other words running Python on the file server where the data are located (see Section 6.1.2). Typically, file servers are shared among research groups, so if you do monopolize a file server’s resources such as cores or RAM, your colleague in the office next door will be the one most likely to complain. That being said, when running a file server svante-ood session, there is a greater onus that you compute responsibily and/or check with your group members before fully taxing one of the servers. See Table 2.1 for a list of file servers and their resource specifications. Note also there is no restriction in using specific file servers even if you lack storage space there, but if you do plan to use significant resources, best to check with jp-admin@techsquare.com beforehand.

After selecting the fileserver tab, the top box contains a list of file servers that permit svante-ood sessions; select one. The remaining options (checkboxes) on this tab are identical to those on the Compute Node Interactive Apps tab, specifically allowing custom Anaconda environments, custom Anaconda installs, Anaconda module selection, , and the option to load your previous notebook session. The blue Launch button will start up the session and pull up the My Interactive Sessions tab. As above, bring up the Notebook splash screen by hitting the Connect to Jupyter button.

My Interactive Sessions:

As noted above, this tab is automatically pulled up when you hit Connect to Jupyter or can be selected from the general svante-ood menu. All running svante-ood sessions for a user are listed, with option to delete any session. By clicking the Session ID link, a list of session-related files is pulled up; for debugging purposes, view these files to search for additional information on problems that might have occurred.

Log Out:

To completely log out of svante-ood, note that clicking the Log Out tab is insufficient; you must also completely quit the browser.