Limit number of parallel compile jobs in PyTorch tests by setting $MAX_JOBS + add support for using Arm Compute Library (ACL) as dependency#4096
Conversation
PyTorch compiles extensions either JIT or AOT using Ninja. When `MAX_JOBS` isn't set, Ninja will determine the parallelism, likely using all available cores. So limit them similar to the build/install step.
| if pytorch_version >= '1.10': | ||
| acl_root = get_software_root('ArmComputeLibrary') | ||
| if acl_root: | ||
| options.append('USE_MKLDNN=ON') |
There was a problem hiding this comment.
Don't you also want options.append('USE_MKLDNN_CBLAS=ON') here (just from the comment in easybuilders/easybuild-easyconfigs#21309 (comment) by @migueldiascosta )?
There was a problem hiding this comment.
I don't see this being set in the PyTorch source. It defaults to OFF, and CI doesn't set it:
https://github.com/pytorch/pytorch/blob/1d16a0978458457dc5c6b50bc19a37359a4bd822/.ci/pytorch/build.sh#L78-L82
Hence I left it disabled here too. It can be set using custom_opts e.g. via hooks
$MAX_JOBS + add support for using Arm Compute Library (ACL) as dependency
|
@boegelbot please test @ jsc-zen3 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 4124834398 processed Message to humans: this is just bookkeeping information for me, |
|
Last test report for |
|
On our A100 + AMD Zen3 system, an installation using 12 cores (out of the available 48) and 1 A100 GPU (out of 4 in the system) took ~46h. Let's see if it's any faster using this updated easyblock... Test install kicked off just now. |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 36 hours 6 mins 14 secs) (1 easyconfigs in total) |
Does this mean it was ~10hrs faster? |
|
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 56 hours 24 mins 9 secs) (1 easyconfigs in total) |
4427dab to
7f6e2cf
Compare
|
You can i ignore the force-push, just undid a commit I just added to this PR. |
It actually took 56 hours now, instead of 46, see test report in #4096 (comment) @Flamefire How does that make any sense... |
|
I wish I knew. Could as well be the filesystem. |
PyTorch compiles extensions either JIT or AOT using Ninja. When
MAX_JOBSisn't set, Ninja will determine the parallelism, likely using all available cores.So limit them similar to the build/install step.
Also support the Arm Compute Library which may greatly enhance performance, see e.g. easybuilders/easybuild-easyconfigs#21309
This doesn't change anything for now as we don't have easyconfigs yet