Scripts to run dask and jupyter lab on Shifter using the pangeo-notebook image
The container is based on the pangeo-notebook image that is curated at https://github.com/pangeo-data/pangeo-stacks.
Pawsey have recently written up some doco about using containers https://support.pawsey.org.au/documentation/display/US/Containers
Shifter (as opposed to Singularity) works directly on docker images and at Pawsey the mapping of the filesystesm is taken care of for you, which makes using shifter very convenient and works without modifying the image. (unlike when using singularity - see https://github.com/pbranson/pangeo-hpc-singularity). And you dont need to build the image, just pull it from docker.
This makes the syntax to start the container simpler, and doesnt require sudo at any point in the process, which is good when working on HPC. In addition it deals with the volatile files directly rather than having to bind a writable folder as in singularity. In addition you can mount a writeable "per-node-cache" into the container which is a single large file on the Lustre filesystem which can be used by dask-workers as a location to spill data when worker memory limits are exceeded. Because the lustre file system sees this as a single file, it doesnt adversely effect the filesystem.
Before running you need to pull the image from docker and register the Shifter container
sg $PAWSEY_PROJECT -c 'shifter pull pangeo/pangeo-notebook:latest'
Two convenience scripts are provided for starting jupyter lab and dask.
jobid=$(sbatch start_jupyter.sh | grep -o [0-9]*) && tail -F jupyter-$jobid.out
start_jupyter.sh
does three things:
- Starts an instance of the container running a dask-scheduler
- Starts an instance of the container running jupyter lab
- Parses the log files to print out a helpful string for tunneling to the port jupyter exposed on the compute node
jobid=$(sbatch start_worker.sh | grep -o [0-9]*) && tail -F dask-worker-$jobid.out
start_worker.sh
uses the container to start dask workers, using the Slurm environment variables to determine the worker specs and memory. This is important to do otherwise dask starts workers that are based on the node specs rather than the job request. Run sbatch start_worker.sh
a few times to get more workers or alter the slurm parameters.
Assuming you tunneled the port with a command like
ssh -N -l $USERNAME -L 8888:z106:8888 zeus.pawsey.org.au
Open the browser to http://localhost:8888/
Connect the dask scheduler with:
from dask.distributed import Client
client=Client(address='localhost:8786')
client
... and view the dask dashboard at http://localhost:8888/proxy/8787/status