Skip to content

Limit number of parallel compile jobs in PyTorch tests by setting $MAX_JOBS + add support for using Arm Compute Library (ACL) as dependency#4096

Open
Flamefire wants to merge 2 commits intoeasybuilders:developfrom
Flamefire:pytorch-test-parallel
Open

Limit number of parallel compile jobs in PyTorch tests by setting $MAX_JOBS + add support for using Arm Compute Library (ACL) as dependency#4096
Flamefire wants to merge 2 commits intoeasybuilders:developfrom
Flamefire:pytorch-test-parallel

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Mar 12, 2026

PyTorch compiles extensions either JIT or AOT using Ninja. When MAX_JOBS isn't set, Ninja will determine the parallelism, likely using all available cores.
So limit them similar to the build/install step.

Also support the Arm Compute Library which may greatly enhance performance, see e.g. easybuilders/easybuild-easyconfigs#21309

This doesn't change anything for now as we don't have easyconfigs yet

PyTorch compiles extensions either JIT or AOT using Ninja.
When `MAX_JOBS` isn't set, Ninja will determine the parallelism, likely
using all available cores.
So limit them similar to the build/install step.
@Flamefire Flamefire changed the title Limit number of parallel compile jobs in PyTorch tests Limit number of parallel compile jobs in PyTorch tests & Add support for ACL Mar 13, 2026
if pytorch_version >= '1.10':
acl_root = get_software_root('ArmComputeLibrary')
if acl_root:
options.append('USE_MKLDNN=ON')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also want options.append('USE_MKLDNN_CBLAS=ON') here (just from the comment in easybuilders/easybuild-easyconfigs#21309 (comment) by @migueldiascosta )?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being set in the PyTorch source. It defaults to OFF, and CI doesn't set it:
https://github.com/pytorch/pytorch/blob/1d16a0978458457dc5c6b50bc19a37359a4bd822/.ci/pytorch/build.sh#L78-L82

Hence I left it disabled here too. It can be set using custom_opts e.g. via hooks

@boegel boegel added this to the next release (5.2.2?) milestone Mar 25, 2026
@boegel boegel changed the title Limit number of parallel compile jobs in PyTorch tests & Add support for ACL Limit number of parallel compile jobs in PyTorch tests by setting $MAX_JOBS + add support for using Arm Compute Library (ACL) as dependency Mar 25, 2026
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 25, 2026

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr4096 PyTorch-2.9.1-foss-2024a.eb"

@boegelbot
Copy link
Copy Markdown

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4096 EB_ARGS="--installpath /tmp/$USER/pr4096 PyTorch-2.9.1-foss-2024a.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4096 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10084

Test results coming soon (I hope)...

Details

- notification for comment with ID 4124834398 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 25, 2026

Last test report for PyTorch-2.9.1-foss-2024a.eb took ~38h on jsc-zen3 with 16 cores, let's see if there's a significant difference now (probably not, since 16 cores is all there is one those VMs, so I don't expect a positive impact by these changes in that type of setup), see easybuilders/easybuild-easyconfigs#25240 (comment)

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 25, 2026

On our A100 + AMD Zen3 system, an installation using 12 cores (out of the available 48) and 1 A100 GPU (out of 4 in the system) took ~46h. Let's see if it's any faster using this updated easyblock... Test install kicked off just now.

@boegelbot
Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.9.1-foss-2024a.eb

Build succeeded for 1 out of 1 (total: 36 hours 6 mins 14 secs) (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.25
See https://gist.github.com/boegelbot/222af8657de71ff2367cc6f67635c248 for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Build succeeded for 1 out of 1 (total: 36 hours 6 mins 14 secs) (1 easyconfigs in total)

Does this mean it was ~10hrs faster?

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 27, 2026

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb

Build succeeded for 1 out of 1 (total: 56 hours 24 mins 9 secs) (1 easyconfigs in total)
node3900.accelgor.os - Linux RHEL 9.6, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 590.48.01, Python 3.9.21
See https://gist.github.com/boegel/7648f6ec2b40c5386c9e51eab4b4b254 for a full test report.

@Flamefire Flamefire force-pushed the pytorch-test-parallel branch from 4427dab to 7f6e2cf Compare March 31, 2026 07:51
@Flamefire
Copy link
Copy Markdown
Contributor Author

You can i ignore the force-push, just undid a commit I just added to this PR.
Looks like we have imkl & imkl-FFTW, but the latter depends on the former so checking for imkl is enough.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 2, 2026

On our A100 + AMD Zen3 system, an installation using 12 cores (out of the available 48) and 1 A100 GPU (out of 4 in the system) took ~46h. Let's see if it's any faster using this updated easyblock... Test install kicked off just now.

It actually took 56 hours now, instead of 46, see test report in #4096 (comment)

@Flamefire How does that make any sense...

@Flamefire
Copy link
Copy Markdown
Contributor Author

I wish I knew. Could as well be the filesystem.
The logs contain the timings for individual suites. That might provide a hint which got much slower. But probably not worth spending much time on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants