Skip to content

Installation

Felix Abecassis edited this page Jul 20, 2020 · 7 revisions

Requirements

  1. enroot version 3.1.0 or later.
  2. Slurm version 20.02 or later.

You can check the enroot requirements here: https://github.com/NVIDIA/enroot/blob/v3.1.0/doc/requirements.md

Installation

Prebuilt packages are not provided. Building the plugin from sources requires the Slurm development headers to be installed (e.g. libslurm-dev).

Since the Pyxis plugin interacts with both srun and slurmd, it must be installed on the login node and the compute nodes of the cluster.

Without packages (any distribution)

This is the recommended way if you installed enroot from sources.

$ sudo make install
$ sudo ln -s /usr/local/share/pyxis/pyxis.conf /etc/slurm-llnl/plugstack.conf.d/pyxis.conf
$ sudo systemctl restart slurmd

With a deb package (Debian-based distributions)

This is the recommended way if you installed enroot through a package, as the generated package will depend on the enroot package.

$ make orig
$ make deb
$ sudo dpkg -i ../nvslurm-plugin-pyxis_*_amd64.deb
$ sudo ln -s /usr/share/pyxis/pyxis.conf /etc/slurm-llnl/plugstack.conf.d/pyxis.conf
$ sudo systemctl restart slurmd

RHEL-based distributions

Generating RPM packages is not supported for now, it is recommended to install without packages.

Verifying (srun)

From the login node, you can first verify if the Pyxis plugin injects its arguments correctly:

$ srun --help | grep container-image
      --container-image=[USER@][REGISTRY#]IMAGE[:TAG]|PATH

If the Pyxis arguments are not showing up, try using strace to check if your plugstack configuration is processed correctly. For example, this is what you should see when Pyxis is working correctly:

$ strace -e openat srun --help >/dev/null
[...]
openat(AT_FDCWD, "/etc/slurm/plugstack.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/slurm/plugstack.conf.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
openat(AT_FDCWD, "/etc/slurm/plugstack.conf.d/pyxis.conf", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/usr/local/lib/slurm/spank_pyxis.so", O_RDONLY|O_CLOEXEC) = 5
+++ exited with 0 +++

Afterwards, you can submit a simple containerized job:

$ srun --container-image=centos grep PRETTY /etc/os-release
PRETTY_NAME="CentOS Linux 8 (Core)"

Verifying (slurmd)

With the default log level, there will be no message in the slurmd log if the plugin loads successfully. Increasing the log level should yield:

debug:  spank: opening plugin stack /etc/slurm/plugstack.conf.d/pyxis.conf
debug3: Couldn't find sym 'slurm_spank_job_prolog' in the plugin [...]
debug2: spank: /usr/local/lib/slurm/spank_pyxis.so: no callbacks in this context

This is normal: the Pyxis plugin does not implement all the Slurm plugin callbacks.

If the plugin file is missing or invalid, slurmd will fail to start:

slurmd: error: spank: /usr/local/lib/slurm/spank_pyxis.so: Plugin file not found
slurmd: error: spank: /etc/slurm/plugstack.conf.d/pyxis.conf:1: Failed to load plugin /usr/local/lib/slurm/spank_pyxis.so. Aborting.
slurmd: error: slurmd initialization failed