Add option to allow missing or additional detected PyTorch test suites #4052

Flamefire · 2026-01-22T11:44:52Z

(created using eb --new-pr)

This relaxes the PyTorch test evaluation a bit. The easyblock parses the XML files and compares that against the summary output in stdout of the test command. We have 2 cases:

1: There are more failures in the XML files than in the summary -> PyTorch didn't consider something as failed that we do. Very weird and might be an issue with the XML parser.
However this is only a minor issue as we counted too many failures (from the XML files) than might be actually present. So if the allowed-test-failure-count check still succeeds we can ignore this, at least for users.

2: The summary shows a failure we have not found in the XML files -> The XML report might be missing because the test crashed or otherwise didn't write its results.
This is an issue because one test ("suite") might contain 100s of test cases where many could have failed but we didn't count any of those failures.
Of course there might be only a single failure but we cannot know for sure, hence we fail.

I added 2 options: allow_extra_failures & allow_missing_failures for those 2 cases.

They can be set to True/False but also to a maximum number

boegel · 2026-01-28T08:34:22Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.7.1-CUDA PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb"

boegelbot · 2026-01-28T08:53:09Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4052 EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.7.1-CUDA PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4052 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 9510

Test results coming soon (I hope)...

Details

- notification for comment with ID 3809792319 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegel · 2026-01-28T13:42:03Z

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.6.0 PyTorch-2.6.0-foss-2024a.eb"

boegelbot · 2026-01-28T13:53:08Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4052 EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.6.0 PyTorch-2.6.0-foss-2024a.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4052 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 9520

Test results coming soon (I hope)...

Details

- notification for comment with ID 3811375867 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2026-01-30T13:30:03Z

Test report by @boegelbot

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-2.6.0-foss-2024a.eb

Build succeeded for 1 out of 1 (total: 47 hours 36 mins 39 secs) (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.23
See https://gist.github.com/boegelbot/890f3c9807ae1e02967f8c74b4c8d5a8 for a full test report.

boegelbot · 2026-02-01T09:34:36Z

Test report by @boegelbot

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb

Build succeeded for 1 out of 1 (total: 52 hours 4 mins 5 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.44.01, Python 3.9.23
See https://gist.github.com/boegelbot/3833453c591f0f8716185274bcee7d7f for a full test report.

boegel · 2026-02-10T19:17:04Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb"

boegelbot · 2026-02-10T19:33:08Z

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4052 EB_ARGS="PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4052 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 9623

Test results coming soon (I hope)...

Details

- notification for comment with ID 3880190685 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegel · 2026-02-13T15:36:06Z

PyTorch/2.7.1-foss-2024a-CUDA-12.6.0 should almost be back @ jsc-zen3...

[boegelbot@jsczen3l1 ~]$ q
             JOBID PARTITION                                               NAME     USER    STATE       TIME TIME_LIMI  NODES   CPUS  NODELIST(REASON) MIN_MEMORY
...
              9623  jsczen3g                                       test_PR_4052 boegelbo  RUNNING 1-12:24:19 4-04:00:00      1     16         jsczen3g1 3800M

Add option to allow missing or additional detected PyTorch test suites

df40b8c

boegel added the enhancement label Jan 28, 2026

boegel added this to the next release (5.2.1?) milestone Jan 28, 2026

Append instead of overwrite failure_msgs

34a8e4e

boegel mentioned this pull request Feb 10, 2026

{ai}[foss/2024a] WhisperX v3.7.4, ONNX-Runtime v1.23.2, ONNX v1.20.0, pyannote.audio v3.4.0, torchaudio v2.7.1, ... w/ CUDA 12.6.0 easybuilders/easybuild-easyconfigs#24922

Open

boegel mentioned this pull request Feb 12, 2026

{ai,bio}[foss/2024a] scvi-tools v1.4.1, captum v0.8.0 w/ CUDA 12.6.0 easybuilders/easybuild-easyconfigs#25022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to allow missing or additional detected PyTorch test suites #4052

Add option to allow missing or additional detected PyTorch test suites #4052

Flamefire commented Jan 22, 2026 •

edited

Loading

Uh oh!

boegel commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 28, 2026

Uh oh!

boegel commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 30, 2026

Uh oh!

boegelbot commented Feb 1, 2026

Uh oh!

boegel commented Feb 10, 2026

Uh oh!

boegelbot commented Feb 10, 2026

Uh oh!

boegel commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add option to allow missing or additional detected PyTorch test suites #4052

Are you sure you want to change the base?

Add option to allow missing or additional detected PyTorch test suites #4052

Conversation

Flamefire commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boegel commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 28, 2026

Uh oh!

boegel commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 28, 2026

Uh oh!

boegelbot commented Jan 30, 2026

Overview of tested easyconfigs (in order)

Uh oh!

boegelbot commented Feb 1, 2026

Overview of tested easyconfigs (in order)

Uh oh!

boegel commented Feb 10, 2026

Uh oh!

boegelbot commented Feb 10, 2026

Uh oh!

boegel commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Flamefire commented Jan 22, 2026 •

edited

Loading