.. _python_jupyter:
Python and Jupyter Notebooks (svante-ood.mit.edu)
-------------------------------------------------
Running Python on Svante
************************
Running `Python `_ on a high-performance computing cluster such as Svante is not as straightforward
as most other software. For other software on Svante, the system administators manage, install, upgrade, etc.
software on the cluster. Users are only permitted write access in their personal spaces, and thus cannot install
software for general usage (nor is users' ``/home`` intended, nor sufficiently large, for software installations).
The core Python language itself is rather limited in its scientific capabilities, but a rapidly expanding library
of available packages is available to augment Python's features; this set of packages is intended to be personalized and managed
by the user, with added packages stored in the users' personal directory spaces.
On Svante we have two different methods of using Python: either using a traditional Python installation (``module load python``,
we have several versions available as shown via ``module avail`` with current default ``python/3.9.1``),
or through a pre-installed ananconda version of Python (``module load ananconda3``).
Anaconda is an open-source package management system and environment management system,
a bit more involved to use, but likely the better choice for more advanced Python users.
Instructions for both are included in this section.
Python code can can be run interactively in a terminal window (or, simply passed a script for execution through a terminal window), or run interactively
in a `Jupyter Notebook `_ browser window (see :numref:`svante_ood`). Note, the latter approach will only work with
the anaconda module (currently ``anaconda3/2020.11``).
Terminal-window based Python with pre-installed packages
========================================================
Both installations of Python on Svante (i.e., module ``python`` or ``anaconda3``) include a large collection of pre-installed packages,
ready for use with a ``import`` command. For example:
::
[jscience2@svante-login ~]$ ssh fs02
Last login: Fri Aug 20 15:25:23 2021 from 172.16.0.101
[jscience2@fs02 ~]$ module load python
[jscience2@fs02 ~]$ python
Python 3.9.1 (default, Feb 3 2021, 15:39:28)
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.zeros((3, 4))
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>>
In the example above, we first ``ssh`` to a fileserver (here, server ``fs02``),
load up a Python module, start up the Python session (alternatively, start the session by typing
``ipython`` to get an interactive
Python shell), and import package `numpy `_.
Alternatively, we could have requested an interactive compute node for our session, using
a slurm ``srun`` request (see :numref:`slurm_interactive`). **DO NOT** run Python on ``svante-login``.
If you have some Python code you want to execute non-interactively, you could write a slurm script, e.g.,
::
#!/bin/bash
#
#SBATCH -J example_py_script
#SBATCH –n 1
#SBATCH –p fdr
#SBATCH –t 2:00:00
source /etc/profile.d/modules.sh
module load python/3.9.1
python «name of python script»
If you find that a Python package you require is not included in our comprehensive list of pre-installed packages,
send an email to jp-admin@techsquare.com, we'll add it.
Using Python with user-installed packages
=========================================
Adding packages to terminal-based (traditional) Python
++++++++++++++++++++++++++++++++++++++++++++++++++++++
To add packages to the terminal-based Python package installed on svante, you have two options.
If this is a common package and/or you think others might to benefit from having
it widely available, the easiest approach is simply to ask us (at jp-admin@techsquare.com)
to add it to Svante's software library (as noted above). The second option is to install this library
into a Python directory in your ``/home`` space, which requires the ``--user`` option
added to the ``pip install`` command:
::
pip install «package name(s)» --user
Or, to install inside a virtual environment:
::
module add python
python3 -m venv «environment name»
source /home/«username»/«environment name»/bin/activate
pip install «package name(s)» --user
deactivate
Using Conda w/package manager to create virtual environments
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If using our installed anaconda module to run Python, you can user-install
packages into your ``/home`` space only through a virtual environment:
::
module add anaconda3
conda create -n «environment name»
source activate «environment name»
conda install «package name(s)»
conda deactivate
**NOTE:** When you create the virtual environment above, the instructions state the command to activate the environment
is ``conda activate «environment name»``. **This will not work on svante**, because you should never have done a ``conda init`` initialization,
which effectively interferes with our :ref:`module environment system `. Also, these steps are only supported using bash, not legacy C shell environments.
.. _svante_ood:
Running Jupyter Notebooks using Svante Open OnDemand
****************************************************
.. figure:: ood_tabs.png
:width: 100%
:align: center
:alt: OOD tabs
:name: ood_tabs
Svante Open OnDemand (https://svante-ood.mit.edu) is a web-based interface to the Svante cluster.
In addition to allowing simple tasks such as file browsing and opening a terminal window, its main
feature allows the user to interactively run Python in a Jupyter Notebook (we will consider
adding other Jupyter-supported applications to its capabilities, email with requests). **Note:
svante-ood only works with Anaconda-installed versions of Python.**
Only Chrome and Firefox browsers are supported at this time (i.e., Safari does not work with svante-ood).
The functions of the menu pull-down tabs are as follows:
**Files**: from here you can list directories, with view, edit, and download functionality. Note by default, your ``/home``
directory is selected, but from there you can navigate to any of your ``/home`` subdirectories. And, we have created soft-links
to all users' file servers spaces (if you are missing access to any your file server spaces, please let us know).
**Jobs**: similar to the slurm ``squeue -u`` command, this will list all your current slurm jobs in the queue (both running and queued jobs).
In addition, this list of active jobs includes svante-ood jobs running on file servers.
**Clusters**: this tab opens a ``svante-login`` terminal window shell in your browser.
**Compute Node Interactive Apps**:
.. figure:: compute_tab.png
:width: 100%
:align: center
:alt: Compute tabs
:name: compute_tab
By filling in this page, SLURM is used to allocate compute node core(s) for running a Jupyter Notebook.
You must specify how many nodes, how many processor cores, how much memory is required, wall time, and which partition.
Unless you are running a large parallel Python application, most likely you just want one node and one (or a few) cores.
Your memory request should be a function of how much data you need to process (i.e., size of your data arrays).
8GB is default but a modest amount for typical applications; if you are processing
many sizable 3D arrays of data (or 4D arrays), you may need to request a larger RAM allocation.
Below the SLURM-related information, there are three check boxes. The first "custom Anaconda Environment"
is to use a virtual environment with Python; after checking this box, the environment name and path is requested.
The second box "custom Anaconda install" is for advanced users who prefer to use their own Anaconda installation (i.e., *full* installation,
not just downloaded packages); when checked, this asks for the path to the user-installed Anaconda directory. Note
virtual enviroments (first box) are allowed even if this second box is checked. The final check box allows the user
to automatically load the previous notebook session, which is saved in the user's ``/home`` directory.
When the top of the page is filled to satisfaction, hit the blue **Lauch** button at the bottom…
this will bring you to the **My Interactive Sessions** tab. Hit **Connect to Jupyter**: this directs you to a launcher
page where you can start up a Python 3 Jupyter Notebook (most typically, this is why you are using svante-ood!),
a Python 3 console window, or any of the other options at the bottom.
The directory path in the left column is pointing at your ``/home`` space (similar to the **Files** tab).
If you have a library of Python routines you want to be accessible to your Python session, there is an easy way
to do this: in your ``.bashrc`` file, include a line:
::
export PYTHONPATH=/home/«username»/«python code library directory»
Note this line should be included in a bash startup file, even if your user login default is C shell.
**Fileserver Interactive Apps**:
.. figure:: fs_tab.png
:width: 100%
:align: center
:alt: Fileserver tab
:name: fs_tab
Why run a svante-ood session on a file server? There are several advantages: no wall time limit, and no core or RAM usage restrictions
other than what is available on the server. And, file server usage is not monitored by the scheduler,
so file server nodes are always available for use, even if other tasks are running concurrently.
Moreover, if you are doing analysis involving heavy file input/output, your script will
likely run much faster if the data are accessed locally, in other words running Python on the file
server where the data are located (see :numref:`fileserver_usage`).
Typically, file servers are shared among research groups, so if you do monopolize
a file server's resources such as cores or RAM, your colleague in the office next door will be the one most likely to complain.
That being said, when running a file server svante-ood session, there is a greater onus that you compute responsibily and/or
check with your group members before fully taxing one of the servers. See :numref:`svante_nodes` for a list of file servers
and their resource specifications. Note also there is no restriction in using
specific file servers even if you lack storage space there,
but if you do plan to use significant resources, best to check with jp-admin@techsquare.com beforehand.
By selecting this tab, the top box contains a list of file servers that permit svante-ood sessions; select one.
The remaining options (checkboxes) on this tab are identical to those on the **Compute Node Interactive Apps** tab,
specifically allowing custom Anaconda environments, custom Anaconda installs, and the option to load your previous notebook session.
The blue **Launch** button will start up the session and pull up the **My Interactive Sessions** tab. As above,
bring up the Notebook splash screen by hitting the **Connect to Jupyter** button.
**My Interactve Sessions**:
As noted above, this tab is automatically pulled up when you hit **Connect to Jupyter** or can be selected from the general svante-ood menu.
All running svante-ood sessions for a user are listed, with option to delete any session. By clicking the **Session ID** link, a list of session-related files
is pulled up; for debugging purposes, view these files to search for additional information on problems that might have occurred.
**Log Out**:
To completely log out of svante-ood, note that clicking the **Log Out** tab is insufficient; you must also completely quit the browser.