{tools}[GCCcore/14.3.0] PyTorch v2.9.1, parameterized v0.9.0, pytest-subtests v0.15.0, ... w/ CUDA 12.9.1 by Flamefire · Pull Request #24926 · easybuilders/easybuild-easyconfigs

Flamefire · 2025-12-18T17:03:43Z

(created using eb --new-pr)

Includes:

{tools}[GCCcore/14.3.0] pytest-subtests v0.15.0, unittest-xml-reporting v3.2.0, parameterized v0.9.0 #24801
{ai}[foss/2024a] PyTorch v2.9.1 w/ CUDA 12.6.0 #24365 (patches)

It makes sense to merge #24365 first as any changes there need to be reflected here. But this allows testing both in parallel

…tests-0.15.0-GCCcore-14.3.0.eb, PyTorch-2.9.1-foss-2025b-CUDA-12.9.1.eb, unittest-xml-reporting-3.2.0-GCCcore-14.3.0.eb and patches: PyTorch-1.12.1_add-hypothesis-suppression.patch, PyTorch-1.7.0_disable-dev-shm-test.patch, PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch, PyTorch-2.1.0_remove-test-requiring-online-access.patch, PyTorch-2.6.0_show-test-duration.patch, PyTorch-2.6.0_skip-test_segfault.patch, PyTorch-2.7.0_avoid_caffe2_test_cpp_jit.patch, PyTorch-2.7.1_avoid-caffe2-sandcastle-test-lib.patch, PyTorch-2.7.1_skip-test_data_parallel_rnn.patch, PyTorch-2.7.1_skip-test_gds_fails_in_ci.patch, PyTorch-2.7.1_skip-test_mixed_mm_exhaustive_dtypes.patch, PyTorch-2.7.1_skip-tests-requiring-SM90.patch, PyTorch-2.7.1_suport-64bit-BARs.patch, PyTorch-2.7.1_tolerance-test_partial_flat_weights.patch, PyTorch-2.9.0_disable-test_nan_assert.patch, PyTorch-2.9.0_enable-symbolizer-in-test_workspace_allocation_error.patch, PyTorch-2.9.0_fix-attention-squeeze.patch, PyTorch-2.9.0_fix-FP16-CPU-tests-in-test_torchinductor_opinfo.patch, PyTorch-2.9.0_fix-nccl-test-env.patch, PyTorch-2.9.0_fix-test_exclude_padding.patch, PyTorch-2.9.0_fix-test_version_error.patch, PyTorch-2.9.0_honor-XDG_CACHE_HOME.patch, PyTorch-2.9.0_increase-tolerance-in-test_transformers.patch, PyTorch-2.9.0_remove-faulty-close.patch, PyTorch-2.9.0_revert-pybind11-3-change.patch, PyTorch-2.9.0_skip-test_benchmark_on_non_zero_device.patch, PyTorch-2.9.0_skip-test_convolution1-on-H100.patch, PyTorch-2.9.0_skip-test_inductor_all_gather_into_tensor_coalesced.patch, PyTorch-2.9.0_skip-test_original_aten_preserved_pad_mm.patch, PyTorch-2.9.0_skip-test_override-without-CUDA.patch, PyTorch-2.9.0_skip-test_unbacked_reduction.patch, PyTorch-2.9.0_skip-tests-requiring-CUDA-12.8.patch, PyTorch-2.9.0_skip-unexpected-success-in-test_fake_export.patch, PyTorch-2.9.1_skip-RingFlexAttentionTest.patch

github-actions · 2025-12-18T17:04:25Z

Diff of new easyconfig(s) against existing ones is too long for a GitHub comment. Use --review-pr (and --review-pr-filter / --review-pr-max) locally.

Flamefire · 2025-12-19T09:27:12Z

Test report by @Thyre FAILED Build succeeded for 3 out of 4 (total: 55 secs) (4 easyconfigs in total) jrc0900.jureca - Linux Rocky Linux 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 580.95.05, Python 3.9.21 See https://gist.github.com/Thyre/576f0dbeceb975733d860d97f16ca3fc for a full test report.

== 2025-12-19 10:19:27,773 build_log.py:233 ERROR EasyBuild encountered an error: Nothing found to replace 'if IS_CI:\n\s+# Add the option to generate XML test report.*' in test/run_test.py (at easybuild/tools/filetools.py:1861 in apply_regex_substitutions)

Are you using the latest easyblock? It is missing this commit from easybuilders/easybuild-easyblocks#3803

Flamefire · 2025-12-19T12:51:21Z

2025b is using GCC 14 that has new warnings. See pytorch/pytorch#166873

Patch added. Seems to only affect ARM

Flamefire · 2025-12-19T13:27:47Z

Oh, it is a C file. Updated the patch to also add it to C-flags

Flamefire · 2025-12-19T14:23:28Z

Looks like I need to set those values earlier. Can you try again?

Thyre · 2025-12-19T14:46:22Z

Actual failure was an internal GCC compiler error:

In file included from /dev/shm/reuter1/easybuild/build/PyTorch/2.9.1/foss-2025b-CUDA-12.9.1/pytorch-v2.9.1/build/aten/src/ATen/native/cpu/Unfold2d.cpp.SVE256.cpp:1:
/dev/shm/reuter1/easybuild/build/PyTorch/2.9.1/foss-2025b-CUDA-12.9.1/pytorch-v2.9.1/aten/src/ATen/native/cpu/Unfold2d.cpp: In function ‘void at::native::{anonymous}::unfolded2d_acc_kernel(c10::ScalarType, void*, void*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)’:
/dev/shm/reuter1/easybuild/build/PyTorch/2.9.1/foss-2025b-CUDA-12.9.1/pytorch-v2.9.1/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: error: unrecognizable insn:
  225 | }
      | ^
(insn 1375 1374 1376 99 (set (reg:VNx16BI 3253)
        (unspec:VNx16BI [
                (reg:VNx16BI 3250)
                (reg:VNx8BI 3252)
                (const_vector:VNx4BI [
                        (const_int 0 [0]) repeated x8
                    ])
            ] UNSPEC_TRN1_CONV)) "/dev/shm/reuter1/easybuild/build/PyTorch/2.9.1/foss-2025b-CUDA-12.9.1/pytorch-v2.9.1/torch/headeronly/util/bit_cast.h":40:14 -1
     (nil))
during RTL pass: vregs
/dev/shm/reuter1/easybuild/build/PyTorch/2.9.1/foss-2025b-CUDA-12.9.1/pytorch-v2.9.1/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: internal compiler error: in extract_insn, at recog.cc:2812
0x7d30df _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	../../gcc/rtl-error.cc:108
0x7d3113 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	../../gcc/rtl-error.cc:116
0xec1d17 extract_insn(rtx_insn*)
	../../gcc/recog.cc:2812
0xc2a28b instantiate_virtual_regs_in_insn
	../../gcc/function.cc:1612
0xc2a28b instantiate_virtual_regs
	../../gcc/function.cc:1995
0xc2a28b execute
	../../gcc/function.cc:2042
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Test report by @Thyre
FAILED
Build succeeded for 3 out of 4 (total: 17 mins 7 secs) (4 easyconfigs in total)
jrc0900.jureca - Linux Rocky Linux 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 580.95.05, Python 3.9.21
See https://gist.github.com/Thyre/bdc1ee06d4f8b430f52f9c220b66e11f for a full test report.

Thyre · 2025-12-19T15:44:49Z

Failure may be caused by this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027

There was a PR which should have worked around this, but seemingly the fix doesn't work?
See also:

Maybe we need to patch GCCcore/14.3.0 with this change?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027#c9

Flamefire · 2025-12-19T15:54:30Z

Failure may be caused by this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027

There was a PR which should have worked around this, but seemingly the fix doesn't work? See also:
* https://github.com/pytorch/pytorch/blob/f026b098e4319413db7d3fc1dbcb39dda69fcf0c/aten/src/ATen/native/cpu/Unfold2d.cpp#L172

* [Build error: unrecognizable insn with using gcc-14 on aarch64 pytorch/pytorch#157842](https://github.com/pytorch/pytorch/issues/157842)

That is not included in this (or any) release yet. I'll add it to the patch list

Maybe we need to patch GCCcore/14.3.0 with this change? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027#c9

Would be an option, not sure if it is worth it: This EC is included since EB 5.1.0, although we did that in the past

Thyre · 2026-01-19T11:51:08Z

We might be able to work around the ICE on aarch64 by just fixing a bit of broken code in PyTorch.
This macro code always evaluates to 0, because neither <version> nor <bit> is included before checking the feature macro:

https://github.com/pytorch/pytorch/blob/d38164a545b4a4e4e0cf73ce67173f70574890b6/torch/headeronly/util/bit_cast.h#L7C1-L14C74

This is the part that causes the ICE, so if we just use the GCC implementation, which is available in GCC 14, we might get further. I haven't let the full build run with after including the header (and used the GCC from #25090), so I cannot completely confirm if that's sufficient.

Edit: Unfortunately not, as this would require C++20. PyTorch uses C++17 😕

Thyre · 2026-02-10T06:00:05Z

@Flamefire, can you add pytorch/pytorch@8fd5093 to this PR? Hopefully we get a bit further on aarch64 then...

Flamefire · 2026-02-10T08:04:52Z

Done. Added to all 3 PRs

Flamefire · 2026-02-11T01:48:45Z

Test report by @Flamefire
FAILED
Build succeeded for 3 out of 4 (total: 26 hours 31 mins 47 secs) (4 easyconfigs in total)
c49 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 580.65.06, Python 3.9.21
See https://gist.github.com/Flamefire/ecd32fab1a260714517c94e22d14d6af for a full test report.

Thyre · 2026-02-11T06:20:15Z

Test report by @Thyre
FAILED
Build succeeded for 3 out of 4 (total: 21 hours 56 mins 37 secs) (4 easyconfigs in total)
jrc0901.jureca - Linux Rocky Linux 9.7, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 590.48.01, Python 3.9.25
See https://gist.github.com/Thyre/74b4698312b58c90063c9a54a892c21e for a full test report.

Thyre · 2026-02-11T09:57:36Z

Looks like a significant number of float32 tests failed on aarch64 / Neoverse V2. Looks like something like this PR might be related? pytorch/pytorch#169937

Still not merged though 😕

Flamefire · 2026-02-11T10:42:24Z

@Thyre

export/test_export_opinfo (431 failed, 201 passed, 40 skipped, 0 errors)
inductor/test_cpu_repro (4 failed, 211 passed, 526 skipped, 0 errors)
inductor/test_cpu_select_algorithm (58 failed, 31 passed, 1621 skipped, 0 errors)
inductor/test_cutlass_backend (8 failed, 142 passed, 2 skipped, 0 errors)
inductor/test_flex_attention (258 failed, 12 passed, 306 skipped, 0 errors)

I've seen issues with test_flex_attention including segfaults.
And @boegel has seen most of the failures in one report in export/test_export_opinfo too.

Looks like a significant number of float32 tests failed on aarch64 / Neoverse V2. Looks like something like this PR might be related? pytorch/pytorch#169937

Can you show a couple such failures, possibly grouped if they look the same?

Thyre · 2026-02-11T11:23:44Z

I caught this from the snippet in the output log here: https://gist.github.com/Thyre/25e6d77bf7117f14c118bf0dd1a3e70f#file-pytorch-2-9-1-foss-2025b-cuda-12-9-1_partial-log-L449

I'll need to check if I still have the full log available. In the worst case, I have to re-run the build 🙈

Flamefire · 2026-02-11T14:30:30Z

I see, yes the float32 issue sticks out. However I don't think that PyTorch MR is related: That deals with invalid conversions such as static_cast<uint8_t>(float(-2)) which are undefined and basically a usage error.

However I hope those failures have a single, common cause. If we find that it would fix >700 tests at once :-)

Maybe cross-check with the PYPI package:

Load the same Python module
Create a virtual env
pip install torch==2.9.1 expecttest numpy
Extract pytorch 2.9.1 archive and cd to test folder
python run_test.py --pipe-logs --verbose --continue-through-error -i inductor/test_torchinductor_opinfo inductor/test_flex_attention inductor/test_cpu_cpp_wrapper

Thyre · 2026-02-12T09:03:52Z

/usr/bin/bash: line 1: /tmp/eb-qpd30m2u/files_pr24926/p/PyTorch/PyTorch-check-cutlass.py: Permission denied

Argh, just wanted to have a test installation (skipping the tests) to better inspect the failing tests 😕

Test report by @Thyre
FAILED
Build succeeded for 3 out of 4 (total: 1 hour 10 mins 21 secs) (4 easyconfigs in total)
jrc0900.jureca - Linux Rocky Linux 9.7, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 590.48.01, Python 3.9.25
See https://gist.github.com/Thyre/b0650190581947115ed40fa4d92f73f7 for a full test report.

Thyre · 2026-02-12T11:59:00Z

The float32 failures on GH200 seem to occur because of missing support:

  File "/p/project1/cswmanage/reuter1/EasyBuild/jedi/apps/software/PyTorch/2.9.1-foss-2025b-CUDA-12.9.1/lib/python3.13/site-packages/torch/_inductor/kernel/flex/flex_cpu.py", line 76, in lower_cpu
    raise NotImplementedError(
        "torch.compile on current platform is not supported for CPU."
    )
torch._inductor.exc.InductorError: LoweringException: NotImplementedError: torch.compile on current platform is not supported for CPU.

Great that they do not skip those tests ...

Thyre · 2026-02-12T12:06:19Z

For CUTLASS, I see some tests failing with e.g.:

FAILED [0.7841s] inductor/test_cutlass_backend.py::TestCutlassBackend::test_compilation_time_use_aoti_False - torch._inductor.exc.InductorError: NoValidChoicesError:

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    python test/inductor/test_cutlass_backend.py TestCutlassBackend.test_compilation_time_use_aoti_False

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Thyre · 2026-02-12T12:09:50Z

test_cpu_select_algorithm gives us actual failures with incorrect results, especially with bfloat16, all in the form of:

FAILED [0.5882s] inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_int8_woq_mm_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cpu_bfloat16 - AssertionError: Scalars are not equal!

Expected 1 but got 0.
Absolute difference: 1
Relative difference: 1.0

To execute this test, run the following from the base repo dir:
    python test/inductor/test_cpu_select_algorithm.py TestSelectAlgorithmCPU.test_int8_woq_mm_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cpu_bfloat16

Flamefire · 2026-02-12T12:49:30Z

/usr/bin/bash: line 1: /tmp/eb-qpd30m2u/files_pr24926/p/PyTorch/PyTorch-check-cutlass.py: Permission denied

Argh, just wanted to have a test installation (skipping the tests) to better inspect the failing tests 😕

I guess we want to make test cases executable: easybuilders/easybuild-framework#5118

test_cpu_select_algorithm gives us actual failures with incorrect results, especially with bfloat16, all in the form of:

That might be checking counters of usages that don't happen on non-AVX2. I guess e.g. at self.assertEqual(counters["inductor"]["cpp_templated_kernel_counter"], 1)
Can you attach/send the log for me to have a closer look? I remember they had some skip-markers for that in some places

Thyre · 2026-02-12T12:55:21Z

Can you attach/send the log for me to have a closer look? I remember they had some skip-markers for that in some places

I unfortunately stopped the tests before the run fully finished, but I can redo them. Hopefully I manage to do that before my vacation.

Flamefire · 2026-02-12T15:58:11Z

I've seen issues with test_flex_attention including segfaults.
And @boegel has seen most of the failures in one report in export/test_export_opinfo too.

test_export_opinfo fails when there is exactly 1 GPU.
Both tests should be fixed with latest patches.

Flamefire · 2026-02-13T04:40:39Z

Test report by @Flamefire
FAILED
Build succeeded for 3 out of 4 (total: 10 hours 23 mins 14 secs) (4 easyconfigs in total)
n1026.barnard.hpc.tu-dresden.de - Linux RHEL 9.6, x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.21
See https://gist.github.com/Flamefire/fa1ca5d23ad4e6c2566fcd58abc3fefc for a full test report.

…terized090

github-actions bot added 2025b issues & PRs related to 2025b common toolchains update labels Dec 18, 2025

This comment was marked as outdated.

Sign in to view

Add testcase

5e96b25

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

Add patch for GCC 14 ARM builds

9f728f6

This comment was marked as outdated.

Sign in to view

Also ignore warning for C files

54f0441

This comment was marked as outdated.

Sign in to view

Move flags setting before including dependencies

b921399

Use flag only for C

dd7464f

Add workaround for GCC 14 ICE

54b64ef

Flamefire changed the title ~~{tools}[GCCcore/14.3.0] parameterized v0.9.0, pytest-subtests v0.15.0, PyTorch v2.9.1, ... w/ CUDA 12.9.1~~ {tools}[GCCcore/14.3.0] PyTorch v2.9.1, parameterized v0.9.0, pytest-subtests v0.15.0, ... w/ CUDA 12.9.1 Dec 19, 2025

boegel added this to the next release (5.2.1?) milestone Dec 31, 2025

Flamefire added 5 commits January 5, 2026 13:38

Remove already included patch

bf271b1

Add missing patch

5e4e033

Skip tests requiring CUDA SM 9.0

20c68d3

Remove old patch

dc3a09e

Add patch avoiding infinite test hang

39cf857

github-actions bot added the 2024a issues & PRs related to 2024a common toolchains label Jan 15, 2026

github-actions bot removed the 2024a issues & PRs related to 2024a common toolchains label Jan 15, 2026

Thyre mentioned this pull request Jan 19, 2026

add patches to GCCcore 14.2.0 & 14.3.0 to fix ICE with SVE on aarch64 #25090

Draft

Flamefire added 5 commits January 22, 2026 16:16

Add patch avoiding infinite test hang

0a6dde0

More patches

f093e09

Fix patched skip markers

0ecb16a

Add comment for DISABLE_ADDR2LINE

3d7005b

Set test timeout

26ab819

Add GCC 14 patch

dae9b55

Add missing patch

0bb1f1a

This comment was marked as off-topic.

Sign in to view

Add patches for test fixes and skip slow&disabled tests

c22bcee

Flamefire and others added 2 commits February 13, 2026 11:07

Add PyTorch-2.6.0_fix-server-in-test_control_plane

03217d9

Merge branch 'easybuilders:develop' into 20251218180340_new_pr_parame…

2441b4f

…terized090

Conversation

Flamefire commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as resolved.

Flamefire commented Dec 19, 2025

Uh oh!

This comment was marked as outdated.

Flamefire commented Dec 19, 2025

Uh oh!

This comment was marked as outdated.

Flamefire commented Dec 19, 2025

Uh oh!

This comment was marked as outdated.

Flamefire commented Dec 19, 2025

Uh oh!

Thyre commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thyre commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Flamefire commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thyre commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thyre commented Feb 10, 2026

Uh oh!

Flamefire commented Feb 10, 2026

Uh oh!

Flamefire commented Feb 11, 2026

Uh oh!

Thyre commented Feb 11, 2026

Uh oh!

Thyre commented Feb 11, 2026

Uh oh!

Flamefire commented Feb 11, 2026

Uh oh!

Thyre commented Feb 11, 2026

Uh oh!

Flamefire commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thyre commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Thyre commented Feb 12, 2026

Uh oh!

Thyre commented Feb 12, 2026

Uh oh!

Thyre commented Feb 12, 2026

Uh oh!

Flamefire commented Feb 12, 2026

Uh oh!

Thyre commented Feb 12, 2026

Uh oh!

Flamefire commented Feb 12, 2026

Uh oh!

Flamefire commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Flamefire commented Dec 18, 2025 •

edited

Loading

Thyre commented Dec 19, 2025 •

edited

Loading

Thyre commented Dec 19, 2025 •

edited

Loading

Flamefire commented Dec 19, 2025 •

edited

Loading

Thyre commented Jan 19, 2026 •

edited

Loading

Flamefire commented Feb 11, 2026 •

edited

Loading

Thyre commented Feb 12, 2026 •

edited

Loading