-
Notifications
You must be signed in to change notification settings - Fork 30
Setup
Pyxis is using the enroot container utility, and relies on the enroot system configuration for most of its behavior. The system administrator should therefore customize this enroot for their clusters. This is achieved by setting environment variables and enabling or disabling hooks:
- enroot configuration
- enroot standard hooks
- enroot extra hooks for PMIx and PyTorch multi-node support.
The extra hooks are not enabled by default as they are stored in a separate directory. Move them to the main hook directory to enable them:
$ sudo cp /usr/share/enroot/hooks.d/50-slurm-pmi.sh /usr/share/enroot/hooks.d/50-slurm-pytorch.sh /etc/enroot/hooks.d
Here is an example of an enroot configuration file for a cluster:
$ cat /etc/enroot/enroot.conf
ENROOT_RUNTIME_PATH /run/enroot/user-$(id -u)
ENROOT_CACHE_PATH /raid/enroot-cache/group-$(id -g)
ENROOT_DATA_PATH /tmp/enroot-data/user-$(id -u)
ENROOT_SQUASH_OPTIONS -noI -noD -noF -noX -no-duplicates
ENROOT_MOUNT_HOME n
ENROOT_RESTRICT_DEV y
ENROOT_ROOTFS_WRITABLE y
-
ENROOT_RUNTIME_PATH
is the working directory for enroot, it is recommended to use atmpfs
(RAM). -
ENROOT_CACHE_PATH
is where docker layers are stored, it is recommended to use persistent local storage. -
ENROOT_DATA_PATH
is the directory where the filesystems of running containers are stored. If your compute nodes have sufficient memory, it is recommended to use atmpfs
for faster container start. Note that/tmp
is not atmpfs
by default on Ubuntu, but that's the case on our cluster. -
ENROOT_SQUASH_OPTIONS
controls the compression parameters for squashfs files.
In our case we disable compression since squashfs files are used only as an intermediate image format when importing a container image. -
ENROOT_MOUNT_HOME n
disables mounting the home directories of users by default in containers. It can still be mounted with--container-mount-home
-
ENROOT_RESTRICT_DEV y
isolates device files inside the container by default. This is useful if you want to allow users to useNVIDIA_VISIBLE_DEVICES
to only have a subset of all GPUs accessible inside their containers. -
ENROOT_ROOTFS_WRITABLE y
makes the containers writable by default, so that users can install additional packages if needed.
If PMIx works when running bare-metal jobs but not when using pyxis, try tweaking your PMIx configuration through systemd:
# cat /etc/systemd/system/slurmd.service
[...]
[Service]
Type=forking
EnvironmentFile=-/etc/default/slurmd
[...]
# cat /etc/default/slurmd
PMIX_MCA_ptl=^usock
PMIX_MCA_psec=none
PMIX_SYSTEM_TMPDIR=/var/empty
PMIX_MCA_gds=hash
Please note that setting PMIX_MCA_psec=none
might be detrimental to security in a multi-tenant Slurm cluster.
Pyxis and enroot can be used with exclusive (OverSubscribe=EXCLUSIVE
) or shared node access in Slurm. It is simpler to start with a setup using exclusive node access.
The default value of PlugStackConfig
should work fine if you follow the installation steps, but you might want to set it explicitly if you run other plugins, see bug 9081:
PlugStackConfig=/etc/slurm/plugstack.conf
To simplify running multi-node jobs with PMIx, you can set the following to remove the need for adding --mpi=pmix
to srun
commands:
MpiDefault=pmix
Depending on your enroot configuration, you might need a custom Slurm prolog to ensure that the configured enroot directories are available.
For instance, here is a template for a Slurm prolog script that creates and set permissions for the enroot directories:
https://github.com/NVIDIA/deepops/blob/20.08/roles/slurm/templates/etc/slurm/prolog.d/50-all-enroot-dirs
When using container_scope=global
in the configuration, named containers are not removed automatically at the end of a job. The cleanup can be performed by a Slurm epilog script that cleans up ENROOT_RUNTIME_PATH
and ENROOT_DATA_PATH
after a job completes, to make sure the storage used by all containers is reclaimed.
Pyxis has currently 5 arguments that can be modified in the Slurm plugstack configuration:
-
runtime_path
is similar toENROOT_RUNTIME_PATH
, it is where pyxis stores temporary squashfs images when importing a docker image. It is recommended to use atmpfs
if your systems have sufficient memory. -
remap_root
controls whether a user will see themselves as UID 0 (root) or their usual UID inside the container. See--no-container-remap-root
. Removed in pyxis 0.12 -
execute_entrypoint
controls whether the entrypoint defined in the container image is executed when pyxis starts the container. See--no-container-entrypoint
. -
container_scope
, controls whether named containers persist across Slurm jobs. When set to the valuejob
, pyxis will automatically cleanup named containers in the job epilog. When set toglobal
, named containers can be reused by future jobs, unless they are manually removed by a custom Slurm epilog script. Note:job
might not be compatible with all enroot configurations. In the Slurm epilog, pyxis won't be able to expand environment variables from the user job. -
sbatch_support
controls whether the pyxis command-line arguments are available for use forsbatch
andsalloc
. This feature can be disabled since it is tricky to getsrun
to work correctly inside asbatch
script running inside a container image.
If no arguments are specified in the plugstack configuration, in all pyxis versions except 0.16 and 0.17, the default values are equivalent to:
$ cat /etc/slurm/plugstack.conf.d/pyxis.conf
required /usr/local/lib/slurm/spank_pyxis.so runtime_path=/run/pyxis execute_entrypoint=0 container_scope=global sbatch_support=1
In pyxis 0.16 and 0.17, the default values are equivalent to:
$ cat /etc/slurm/plugstack.conf.d/pyxis.conf
required /usr/local/lib/slurm/spank_pyxis.so runtime_path=/run/pyxis execute_entrypoint=0 container_scope=job sbatch_support=1