Jupyter Notebooks

In this guide we’ll demonstrate how to request cluster resources using jupyter.

Setup Jupyter on CSD3

# One-time setup.
# Load a recent python
# Uncomment one of the three following blocks depending on your target cluster.
# Use this when running on cclake
module purge
module load rhel8/cclake/base
module load python/3.11.9/gcc/nptrdpll
# Use this when running on icelake
#module load rhel8/default-icl
#module load python/3.9.12/gcc/pdcqf4o5
# Use this when running on ampere
#module load rhel8/default-amp
#module load python/3.8.11/gcc-9.4.0-yb6rzr6
# Create and activate a virtual environment
python -m venv ~/jupyter-env 
source ~/jupyter-env/bin/activate
# Install Jupyter in your virtual environment
pip install jupyter
# Add other packages with `pip install`
# 

Running Jupyter

#!/bin/sh
# Session setup: run every time.
# Activate the virtual environment
source ~/jupyter-env/bin/activate
# Start the notebook server
jupyter notebook --no-browser --ip=127.0.0.1 --port=8081 

this will print out lots of messages, finishing with a web address. Copy this web address into the clipboard and make a note of which login node you have used.

Then from your local machine, forward a port to be able to connect to the notebook server:

   $ ssh -L 8081:127.0.0.1:8081 -l <username> -fN login-q-1.hpc.cam.ac.uk

where <username> is your login name on CSD3. Make sure that you pick the same login node (here we are assuming login-q-1) that you started the notebook server on. You can then open the web address and connect to your notebook server.

Running Jupyter on a compute node

  • On the CSD3 login node submit a GPU job by running the following or setting the appropriate SLURM variables in a submission script:

    $ salloc -t 02:00:00 --nodes=1 --gres=gpu:1 --ntasks-per-node=1 --cpus-per-task=3 --partition=ampere -A YOURACCOUNT jupyter notebook --no-browser --ip=* --port=8081
    

where -A YOURACCOUNT is set to the same account you use for sbatch submission (same -A option).

  • Take node of the gpu node name when it appears at the prompt like:

    ` salloc: Nodes gpu-q-XX are ready for job. `

    just before the jupyter notebook command outputs. You might need to scroll up a bit. Collect the web address published at the end of the jupyter notebook printouts into your copy buffer.

  • From your local machine, forward a port to be able to connect to the notebook server. Replace gpu-q-X with the requested resource and login-q-Y with the node you started the notebook server on:

    $ ssh -L 8081:gpu-q-X:8081 -l <username> -fN login-q-Y.hpc.cam.ac.uk
    

where <username> is your login name on CSD3.

  • Copy the address into your browser and change the “gpu-q-X” string to 127.0.0.1

  • If you encounter the “bind: Address already in use” issue, it’s because a port has already been opened. In which case you can stop the process that is associated with that port and try again:

    $  lsof -ti:8081 | xargs kill -9
    

Running jobs from within Jupyter

  • It is possible to bypass SLURM altogether and use Jupyter notebooks to submit jobs on the cluster. To do this, follow the documentation outlined above to install Jupyter and run on the login node (not compute). Begin by installing remote_ikernel:

    $ pip install remote_ikernel
    
  • The following is an example of configuring a remote kernel that uses the SLURM interface:

    $ remote_ikernel manage --add \
        --kernel_cmd="ipython kernel -f {connection_file}" \
        --name="Kernel name" --cpus=32 --interface=slurm \
        --remote-precmd "source $VENV_PATH/bin/activate" \
        --remote-launch-args '-A <your-account> -p icelake -t 12:00:00' --verbose
    
  • Where VENV_PATH needs to be changed to point to the virtual environment location, but the connection_file can be left blank as above. The remote-precmd avoids you having to modify your .bashrc. In the last line, -A <your-account> refers to your batch account as usual (these are sbatch options).

  • Launch the Jupyter notebook and select the appropriate kernel with the name provided in the string above. This will request the resources for the time specified in –remote-launch-args when the kernel is executed. To see the kernel, it might be necessary to refresh the jupyter notebook. It will then appear in NEW –> KERNEL_NAME.

  • You can delete the kernel by running:

    $ remote_ikernel manage delete KERNEL_NAME
    
  • And you can find the name of the currently installed list of kernels by doing:

    $ jupyter kernelspec list
    
  • It is possible to add other jupyter kernels including gnuplot, julia …

  • We still need to look into multi-node support.