Main features

Support for multi-node multi-gpu jobs. The assignment of GPUs to each process (via $CUDA_VISIBLE_DEVICES) is automatically done. Each process of the job is assumed to be single-GPU.
Run a command without writing submission scripts

runjob --ngpus=8 --project=myproject --queue=myqueue python distributed_program.py

Create automatically one directory for each job and redirect the output of each process to a separate log file.
Automatically print the outputs of your job to your monitor. No need to run tail -f manually.
Cancel your job using a CTRL+C as if you were running you program interactively.
Exports useful environment variables related to your project and available resources.

Exported environment variables:

JOB_DIR
PROJECT_DIR
N_CPUS
CONDA_ROOT
CONDA_ENV
N_PROCS
PROC_ID
OUT_FILE
JOB_LOG_FILE

Config file

See examples/config.yaml for config of projects and queues.

Usage

runjob-config examples/config.yaml
runjob --ngpus=8 --project=myproject --queue=myqueue python distributed_program.py

This will start a multi-gpu (possibly multi-node according to your queue config) job with 1 process per GPU and print the output (stdout and stderr) of one of the processes (the one with SLURM_LOCALID=0). Use keyboard interupt to cancel your job.

Running tests

Make sure you are on a SLURM cluster which sinfo should output something.

runjob-config examples/config.yaml
pytest -vs

Features that will be added in the future:

A simple interface to resume long-running jobs automatically.
Copy your python project to an temporary directory to ensure modifications you make to your code do not affect your job while it's running.
A utility to runjob for interactive jobs.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
job_runner		job_runner
tests		tests
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main features

Exported environment variables:

Config file

Usage

Running tests

Features that will be added in the future:

About

Releases

Packages

Languages

ylabbe/job-runner

Folders and files

Latest commit

History

Repository files navigation

Main features

Exported environment variables:

Config file

Usage

Running tests

Features that will be added in the future:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages