Skip to content
Felix Abecassis edited this page Mar 19, 2024 · 20 revisions

enroot configuration

Pyxis is using the enroot container utility, and relies on the enroot system configuration for most of its behavior. The system administrator should therefore customize this enroot for their clusters. This is achieved by setting environment variables and enabling or disabling hooks:

The extra hooks are not enabled by default as they are stored in a separate directory. Move them to the main hook directory to enable them:

$ sudo cp /usr/share/enroot/hooks.d/50-slurm-pmi.sh /usr/share/enroot/hooks.d/50-slurm-pytorch.sh /etc/enroot/hooks.d

enroot configuration (example)

Here is an example of an enroot configuration file for a cluster:

$ cat /etc/enroot/enroot.conf
ENROOT_RUNTIME_PATH /run/enroot/user-$(id -u)
ENROOT_CACHE_PATH /raid/enroot-cache/group-$(id -g)
ENROOT_DATA_PATH /tmp/enroot-data/user-$(id -u)
ENROOT_SQUASH_OPTIONS -noI -noD -noF -noX -no-duplicates
ENROOT_MOUNT_HOME n
ENROOT_RESTRICT_DEV y
ENROOT_ROOTFS_WRITABLE y
  • ENROOT_RUNTIME_PATH is the working directory for enroot, it is recommended to use a tmpfs (RAM).
  • ENROOT_CACHE_PATH is where docker layers are stored, it is recommended to use persistent local storage.
  • ENROOT_DATA_PATH is the directory where the filesystems of running containers are stored. If your compute nodes have sufficient memory, it is recommended to use a tmpfs for faster container start. Note that /tmp is not a tmpfs by default on Ubuntu, but that's the case on our cluster.
  • ENROOT_SQUASH_OPTIONS controls the compression parameters for squashfs files.
    In our case we disable compression since squashfs files are used only as an intermediate image format when importing a container image.
  • ENROOT_MOUNT_HOME n disables mounting the home directories of users by default in containers. It can still be mounted with --container-mount-home
  • ENROOT_RESTRICT_DEV y isolates device files inside the container by default. This is useful if you want to allow users to use NVIDIA_VISIBLE_DEVICES to only have a subset of all GPUs accessible inside their containers.
  • ENROOT_ROOTFS_WRITABLE y makes the containers writable by default, so that users can install additional packages if needed.

Slurmd configuration

If PMIx works when running bare-metal jobs but not when using pyxis, try tweaking your PMIx configuration through systemd:

# cat /etc/systemd/system/slurmd.service
[...]
[Service]
Type=forking
EnvironmentFile=-/etc/default/slurmd
[...]

# cat /etc/default/slurmd
PMIX_MCA_ptl=^usock
PMIX_MCA_psec=none
PMIX_SYSTEM_TMPDIR=/var/empty
PMIX_MCA_gds=hash

Please note that setting PMIX_MCA_psec=none might be detrimental to security in a multi-tenant Slurm cluster.

Slurm configuration

Pyxis and enroot can be used with exclusive (OverSubscribe=EXCLUSIVE) or shared node access in Slurm. It is simpler to start with a setup using exclusive node access.

The default value of PlugStackConfig should work fine if you follow the installation steps, but you might want to set it explicitly if you run other plugins, see bug 9081:

PlugStackConfig=/etc/slurm/plugstack.conf 

To simplify running multi-node jobs with PMIx, you can set the following to remove the need for adding --mpi=pmix to srun commands:

MpiDefault=pmix

Slurm prolog

Depending on your enroot configuration, you might need a custom Slurm prolog to ensure that the configured enroot directories are available. For instance, here is a template for a Slurm prolog script that creates and set permissions for the enroot directories:
https://github.com/NVIDIA/deepops/blob/20.08/roles/slurm/templates/etc/slurm/prolog.d/50-all-enroot-dirs

Slurm epilog

When using container_scope=global in the configuration, named containers are not removed automatically at the end of a job. The cleanup can be performed by a Slurm epilog script that cleans up ENROOT_RUNTIME_PATH and ENROOT_DATA_PATH after a job completes, to make sure the storage used by all containers is reclaimed.

Slurm plugstack configuration

Pyxis has currently 5 arguments that can be modified in the Slurm plugstack configuration:

  • runtime_path is similar to ENROOT_RUNTIME_PATH, it is where pyxis stores temporary squashfs images when importing a docker image. It is recommended to use a tmpfs if your systems have sufficient memory.
  • remap_root controls whether a user will see themselves as UID 0 (root) or their usual UID inside the container. See --no-container-remap-root. Removed in pyxis 0.12
  • execute_entrypoint controls whether the entrypoint defined in the container image is executed when pyxis starts the container. See --no-container-entrypoint.
  • container_scope, controls whether named containers persist across Slurm jobs. When set to the value job, pyxis will automatically cleanup named containers in the job epilog. When set to global , named containers can be reused by future jobs, unless they are manually removed by a custom Slurm epilog script. Note: job might not be compatible with all enroot configurations. In the Slurm epilog, pyxis won't be able to expand environment variables from the user job.
  • sbatch_support controls whether the pyxis command-line arguments are available for use for sbatch and salloc. This feature can be disabled since it is tricky to get srun to work correctly inside a sbatch script running inside a container image.

If no arguments are specified in the plugstack configuration, in all pyxis versions except 0.16 and 0.17, the default values are equivalent to:

$ cat /etc/slurm/plugstack.conf.d/pyxis.conf 
required /usr/local/lib/slurm/spank_pyxis.so runtime_path=/run/pyxis execute_entrypoint=0 container_scope=global sbatch_support=1

In pyxis 0.16 and 0.17, the default values are equivalent to:

$ cat /etc/slurm/plugstack.conf.d/pyxis.conf 
required /usr/local/lib/slurm/spank_pyxis.so runtime_path=/run/pyxis execute_entrypoint=0 container_scope=job sbatch_support=1