Skip to content

{ai}[foss/2023b] PyTorch v2.3.0 w/ CUDA 12.4.0#23553

Merged
boegel merged 6 commits intoeasybuilders:developfrom
Flamefire:20250731181938_new_pr_PyTorch230
Feb 11, 2026
Merged

{ai}[foss/2023b] PyTorch v2.3.0 w/ CUDA 12.4.0#23553
boegel merged 6 commits intoeasybuilders:developfrom
Flamefire:20250731181938_new_pr_PyTorch230

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jul 31, 2025

(created using eb --new-pr)

Requires:

…es: PyTorch-1.7.0_disable-dev-shm-test.patch, PyTorch-1.12.1_add-hypothesis-suppression.patch, PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch, PyTorch-1.12.1_fix-TestTorch.test_to.patch, PyTorch-1.12.1_skip-test_round_robin.patch, PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch, PyTorch-1.13.1_fix-protobuf-dependency.patch, PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch, PyTorch-1.13.1_skip-failing-singular-grad-test.patch, PyTorch-1.13.1_skip-tests-without-fbgemm.patch, PyTorch-2.0.1_avoid-test_quantization-failures.patch, PyTorch-2.0.1_fix-skip-decorators.patch, PyTorch-2.0.1_fix-vsx-loadu.patch, PyTorch-2.0.1_skip-failing-gradtest.patch, PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch, PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch, PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch, PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch, PyTorch-2.1.0_remove-test-requiring-online-access.patch, PyTorch-2.1.0_skip-diff-test-on-ppc.patch, PyTorch-2.1.0_skip-dynamo-test_predispatch.patch, PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch, PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch, PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch, PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch, PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch, PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch, PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch, PyTorch-2.3.0_skip-test_init_from_local_shards.patch, PyTorch-2.3.0_no-cuda-stubs-rpath.patch, PyTorch-2.3.0_disable-gcc12-warning.patch, PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch, PyTorch-2.3.0_fix-test_fine_tuning.patch, PyTorch-2.3.0_disable_tests_which_need_network_download.patch, PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch, PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch, PyTorch-2.3.0_relax-test_unbacked_reduction.patch, PyTorch-2.3.0_remove-fsspec-test.patch, PyTorch-2.3.0_skip_test_var_mean_differentiable.patch, PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch, PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch, PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch, PyTorch-2.6.0_show-test-duration.patch, PyTorch-2.7.1_suport-64bit-BARs.patch
@github-actions
Copy link

github-actions bot commented Jul 31, 2025

Updated software cuDNN-9.0.0.312-CUDA-12.4.0.eb

Diff against cuDNN-9.10.1.4-CUDA-12.8.0.eb

easybuild/easyconfigs/c/cuDNN/cuDNN-9.10.1.4-CUDA-12.8.0.eb

diff --git a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.10.1.4-CUDA-12.8.0.eb b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
index de7683e3f5..c1f3b9f096 100644
--- a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.10.1.4-CUDA-12.8.0.eb
+++ b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
@@ -1,5 +1,5 @@
 name = 'cuDNN'
-version = '9.10.1.4'
+version = '9.0.0.312'
 versionsuffix = '-CUDA-%(cudaver)s'
 homepage = 'https://developer.nvidia.com/cudnn'
 description = """The NVIDIA CUDA Deep Neural Network library (cuDNN) is
@@ -16,22 +16,30 @@ source_urls = [
 ]
 sources = ['%%(namelower)s-linux-%%(cudnnarch)s-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major]
 checksums = [{
-    f'%(namelower)s-linux-aarch64-%(version)s_cuda{local_cuda_major}-archive.tar.xz':
-        'e8752e3708b54bb0cea0efc4a083218d9d1fc35a443b50343bab1d902f27ec34',
-    f'%(namelower)s-linux-sbsa-%(version)s_cuda{local_cuda_major}-archive.tar.xz':
-        'd5cd68d4d09a151ad839a352f6fa01c3f86ccfb498704456892b992c3d8e4c88',
-    f'%(namelower)s-linux-x86_64-%(version)s_cuda{local_cuda_major}-archive.tar.xz':
-        'be759754e5bd1fcd9b490e224796c87f093c1e92b2b6357854d5371b6aeeb8be',
+    '%%(namelower)s-linux-ppc64le-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
+        'b8ef6f249128e1985893a8787a21de35cb83ec47c6dc6fd1809061dd9a3ffb20',
+    '%%(namelower)s-linux-sbsa-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
+        '430fbf5b513c69e989b3a3a5a572369778ce0c214ce1259af6b935f9cab7dd54',
+    '%%(namelower)s-linux-x86_64-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
+        'd3890e609d6530ee5b88ff95b60c8e6b1c1ec7fa966ec533925f20f896fcc630',
 }]
 
-dependencies = [('CUDA', '12.8.0')]
+dependencies = [('CUDA', '12.4.0')]
 
+local_static_libs = [
+    'libcudnn_adv_static_v9.a',
+    'libcudnn_cnn_static_v9.a',
+    'libcudnn_engines_precompiled_static_v9.a',
+    'libcudnn_engines_runtime_compiled_static_v9.a',
+    'libcudnn_graph_static_v9.a',
+    'libcudnn_heuristic_static_v9.a',
+    'libcudnn_ops_static_v9.a',
+]
 sanity_check_paths = {
     'files': [
-        'include/cudnn.h', 'lib64/libcudnn_adv_static.a', 'lib64/libcudnn_cnn_static.a',
-        'lib64/libcudnn_engines_precompiled_static.a', 'lib64/libcudnn_engines_runtime_compiled_static.a',
-        'lib64/libcudnn_graph_static.a', 'lib64/libcudnn_heuristic_static.a', 'lib64/libcudnn_ops_static.a',
-    ],
+        'include/cudnn.h',
+        'lib64/libcudnn.%s' % SHLIB_EXT
+    ] + ['lib64/' + i for i in local_static_libs],
     'dirs': ['include', 'lib64'],
 }
 
Diff against cuDNN-9.5.0.50-CUDA-12.6.0.eb

easybuild/easyconfigs/c/cuDNN/cuDNN-9.5.0.50-CUDA-12.6.0.eb

diff --git a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.5.0.50-CUDA-12.6.0.eb b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
index 76340a4e65..c1f3b9f096 100644
--- a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.5.0.50-CUDA-12.6.0.eb
+++ b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
@@ -1,5 +1,5 @@
 name = 'cuDNN'
-version = '9.5.0.50'
+version = '9.0.0.312'
 versionsuffix = '-CUDA-%(cudaver)s'
 homepage = 'https://developer.nvidia.com/cudnn'
 description = """The NVIDIA CUDA Deep Neural Network library (cuDNN) is
@@ -9,7 +9,6 @@ toolchain = SYSTEM
 
 # note: cuDNN is tied to specific to CUDA versions,
 # see also https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#cudnn-cuda-hardware-versions
-local_short_ver = '.'.join(version.split('.')[:3])
 local_cuda_major = '12'
 
 source_urls = [
@@ -17,20 +16,30 @@ source_urls = [
 ]
 sources = ['%%(namelower)s-linux-%%(cudnnarch)s-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major]
 checksums = [{
+    '%%(namelower)s-linux-ppc64le-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
+        'b8ef6f249128e1985893a8787a21de35cb83ec47c6dc6fd1809061dd9a3ffb20',
     '%%(namelower)s-linux-sbsa-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
-        '494b640a69feb40ce806a726aa63a1de6b2ec459acbe6a116ef6fe3e6b27877d',
+        '430fbf5b513c69e989b3a3a5a572369778ce0c214ce1259af6b935f9cab7dd54',
     '%%(namelower)s-linux-x86_64-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
-        '86e4e4f4c09b31d3850b402d94ea52741a2f94c2f717ddc8899a14aca96e032d',
+        'd3890e609d6530ee5b88ff95b60c8e6b1c1ec7fa966ec533925f20f896fcc630',
 }]
 
-dependencies = [('CUDA', '12.6.0')]
+dependencies = [('CUDA', '12.4.0')]
 
+local_static_libs = [
+    'libcudnn_adv_static_v9.a',
+    'libcudnn_cnn_static_v9.a',
+    'libcudnn_engines_precompiled_static_v9.a',
+    'libcudnn_engines_runtime_compiled_static_v9.a',
+    'libcudnn_graph_static_v9.a',
+    'libcudnn_heuristic_static_v9.a',
+    'libcudnn_ops_static_v9.a',
+]
 sanity_check_paths = {
     'files': [
-        'include/cudnn.h', 'lib64/libcudnn_adv_static.a', 'lib64/libcudnn_cnn_static.a',
-        'lib64/libcudnn_engines_precompiled_static.a', 'lib64/libcudnn_engines_runtime_compiled_static.a',
-        'lib64/libcudnn_graph_static.a', 'lib64/libcudnn_heuristic_static.a', 'lib64/libcudnn_ops_static.a',
-    ],
+        'include/cudnn.h',
+        'lib64/libcudnn.%s' % SHLIB_EXT
+    ] + ['lib64/' + i for i in local_static_libs],
     'dirs': ['include', 'lib64'],
 }
 
Diff against cuDNN-9.1.1.17-CUDA-12.4.0.eb

easybuild/easyconfigs/c/cuDNN/cuDNN-9.1.1.17-CUDA-12.4.0.eb

diff --git a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.1.1.17-CUDA-12.4.0.eb b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
index c2f8e74e21..c1f3b9f096 100644
--- a/easybuild/easyconfigs/c/cuDNN/cuDNN-9.1.1.17-CUDA-12.4.0.eb
+++ b/easybuild/easyconfigs/c/cuDNN/cuDNN-9.0.0.312-CUDA-12.4.0.eb
@@ -1,5 +1,5 @@
 name = 'cuDNN'
-version = '9.1.1.17'
+version = '9.0.0.312'
 versionsuffix = '-CUDA-%(cudaver)s'
 homepage = 'https://developer.nvidia.com/cudnn'
 description = """The NVIDIA CUDA Deep Neural Network library (cuDNN) is
@@ -9,7 +9,6 @@ toolchain = SYSTEM
 
 # note: cuDNN is tied to specific to CUDA versions,
 # see also https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#cudnn-cuda-hardware-versions
-local_short_ver = '.'.join(version.split('.')[:3])
 local_cuda_major = '12'
 
 source_urls = [
@@ -17,22 +16,30 @@ source_urls = [
 ]
 sources = ['%%(namelower)s-linux-%%(cudnnarch)s-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major]
 checksums = [{
+    '%%(namelower)s-linux-ppc64le-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
+        'b8ef6f249128e1985893a8787a21de35cb83ec47c6dc6fd1809061dd9a3ffb20',
     '%%(namelower)s-linux-sbsa-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
-        '19bd66ee9fb30348f18801a398d0bec98b4663866efa244ca122825b3429526c',
+        '430fbf5b513c69e989b3a3a5a572369778ce0c214ce1259af6b935f9cab7dd54',
     '%%(namelower)s-linux-x86_64-%%(version)s_cuda%s-archive.tar.xz' % local_cuda_major:
-        '992b4be26899cc4c618bb1f6989261df7d0a9f9032b2217bf1fce9dd3228c904',
+        'd3890e609d6530ee5b88ff95b60c8e6b1c1ec7fa966ec533925f20f896fcc630',
 }]
 
 dependencies = [('CUDA', '12.4.0')]
 
+local_static_libs = [
+    'libcudnn_adv_static_v9.a',
+    'libcudnn_cnn_static_v9.a',
+    'libcudnn_engines_precompiled_static_v9.a',
+    'libcudnn_engines_runtime_compiled_static_v9.a',
+    'libcudnn_graph_static_v9.a',
+    'libcudnn_heuristic_static_v9.a',
+    'libcudnn_ops_static_v9.a',
+]
 sanity_check_paths = {
     'files': [
-        'include/cudnn.h', 'lib64/libcudnn_adv_static.a',
-        'lib64/libcudnn_cnn_static.a', 'lib64/libcudnn_engines_runtime_compiled_static.a',
-        'lib64/libcudnn_engines_precompiled_static.a', 'lib64/libcudnn_graph_static.a',
-        'lib64/libcudnn_ops_static.a', 'lib64/libcudnn_heuristic_static.a',
+        'include/cudnn.h',
         'lib64/libcudnn.%s' % SHLIB_EXT
-    ],
+    ] + ['lib64/' + i for i in local_static_libs],
     'dirs': ['include', 'lib64'],
 }
 

Updated software PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb

Diff against PyTorch-2.6.0-foss-2024a.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.6.0-foss-2024a.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.6.0-foss-2024a.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
index 975a779408..bb1f766f48 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.6.0-foss-2024a.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
@@ -1,194 +1,177 @@
 name = 'PyTorch'
-version = '2.6.0'
+version = '2.3.0'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
 PyTorch is a deep learning framework that puts Python first."""
 
-toolchain = {'name': 'foss', 'version': '2024a'}
+toolchain = {'name': 'foss', 'version': '2023b'}
 
 source_urls = [GITHUB_RELEASE]
 sources = ['%(namelower)s-v%(version)s.tar.gz']
 patches = [
     'PyTorch-1.7.0_disable-dev-shm-test.patch',
     'PyTorch-1.12.1_add-hypothesis-suppression.patch',
+    'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch',
     'PyTorch-1.12.1_fix-TestTorch.test_to.patch',
+    'PyTorch-1.12.1_skip-test_round_robin.patch',
     'PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch',
+    'PyTorch-1.13.1_fix-protobuf-dependency.patch',
+    'PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch',
     'PyTorch-1.13.1_skip-failing-singular-grad-test.patch',
+    'PyTorch-1.13.1_skip-tests-without-fbgemm.patch',
     'PyTorch-2.0.1_avoid-test_quantization-failures.patch',
+    'PyTorch-2.0.1_fix-skip-decorators.patch',
+    'PyTorch-2.0.1_fix-vsx-loadu.patch',
     'PyTorch-2.0.1_skip-failing-gradtest.patch',
     'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch',
     'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch',
+    'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch',
+    'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch',
     'PyTorch-2.1.0_remove-test-requiring-online-access.patch',
+    'PyTorch-2.1.0_skip-diff-test-on-ppc.patch',
     'PyTorch-2.1.0_skip-dynamo-test_predispatch.patch',
+    'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch',
+    'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch',
     'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch',
     'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch',
-    'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch',
+    'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch',
+    'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch',
+    'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch',
+    'PyTorch-2.3.0_skip-test_init_from_local_shards.patch',
+    'PyTorch-2.3.0_no-cuda-stubs-rpath.patch',
+    'PyTorch-2.3.0_disable-gcc12-warning.patch',
+    'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch',
+    'PyTorch-2.3.0_fix-test_fine_tuning.patch',
+    'PyTorch-2.3.0_disable_tests_which_need_network_download.patch',
+    'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch',
+    'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch',
+    'PyTorch-2.3.0_relax-test_unbacked_reduction.patch',
+    'PyTorch-2.3.0_remove-fsspec-test.patch',
     'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch',
-    'PyTorch-2.6.0_add-checkfunctionexists-include.patch',
-    'PyTorch-2.6.0_allow-sympy-1.13.3.patch',
-    'PyTorch-2.6.0_avoid_caffe2_test_cpp_jit.patch',
-    'PyTorch-2.6.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch',
-    'PyTorch-2.6.0_disable_tests_which_need_network_download.patch',
-    'PyTorch-2.6.0_disable-gcc12-warnings.patch',
-    'PyTorch-2.6.0_fix-accuracy-issues-in-linalg_solve.patch',
-    'PyTorch-2.6.0_fix-cpuinfo-bug-with-smt.patch',
-    'PyTorch-2.6.0_fix-distributed-tests-without-gpus.patch',
-    'PyTorch-2.6.0_fix-edge-case-causing-test_trigger_bisect_on_error-failure.patch',
-    'PyTorch-2.6.0_fix-ExcTests.test_trigger_on_error.patch',
-    'PyTorch-2.6.0_fix-flaky-test_aot_export_with_torch_cond.patch',
-    'PyTorch-2.6.0_fix-inductor-device-interface.patch',
-    'PyTorch-2.6.0_fix-server-in-test_control_plane.patch',
-    'PyTorch-2.6.0_fix-skip-decorators.patch',
-    'PyTorch-2.6.0_fix-sympy-1.13-compat.patch',
-    'PyTorch-2.6.0_fix-test_autograd_cpp_node_saved_float.patch',
-    'PyTorch-2.6.0_fix-test_linear_with_embedding.patch',
-    'PyTorch-2.6.0_fix-test_linear_with_in_out_buffer-without-mkl.patch',
-    'PyTorch-2.6.0_fix-test_public_bindings.patch',
-    'PyTorch-2.6.0_fix-test_unbacked_bindings_for_divisible_u_symint.patch',
-    'PyTorch-2.6.0_fix-vsx-vector-shift-functions.patch',
-    'PyTorch-2.6.0_fix-xnnpack-float16-convert.patch',
-    'PyTorch-2.6.0_increase-tolerance-test_aotdispatch-matmul.patch',
-    'PyTorch-2.6.0_increase-tolerance-test_quick-baddbmm.patch',
-    'PyTorch-2.6.0_increase-tolerance-test_vmap_autograd_grad.patch',
-    'PyTorch-2.6.0_remove-test_slice_with_floordiv.patch',
+    'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch',
+    'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch',
+    'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch',
     'PyTorch-2.6.0_show-test-duration.patch',
-    'PyTorch-2.6.0_skip-diff-test-on-ppc.patch',
-    'PyTorch-2.6.0_skip-test_checkpoint_wrapper_parity-on-cpu.patch',
-    'PyTorch-2.6.0_skip-test_init_from_local_shards.patch',
-    'PyTorch-2.6.0_skip-test_jvp_linalg_det_singular.patch',
-    'PyTorch-2.6.0_skip-test-requiring-MKL.patch',
-    'PyTorch-2.6.0_skip-test_segfault.patch',
-    'PyTorch-2.6.0_skip-tests-without-fbgemm.patch',
+    'PyTorch-2.7.1_suport-64bit-BARs.patch',
 ]
 checksums = [
-    {'pytorch-v2.6.0.tar.gz': '3005690eb7b083c443a38c7657938af63902f524ad87a6c83f1aca38c77e3b57'},
+    {'pytorch-v2.3.0.tar.gz': '69579513b26261bbab32e13b7efc99ad287fcf3103087f2d4fdf1adacd25316f'},
     {'PyTorch-1.7.0_disable-dev-shm-test.patch': '622cb1eaeadc06e13128a862d9946bcc1f1edd3d02b259c56a9aecc4d5406b8a'},
     {'PyTorch-1.12.1_add-hypothesis-suppression.patch':
      'e71ffb94ebe69f580fa70e0de84017058325fdff944866d6bd03463626edc32c'},
+    {'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch':
+     '1efc9850c431d702e9117d4766277d3f88c5c8b3870997c9974971bce7f2ab83'},
     {'PyTorch-1.12.1_fix-TestTorch.test_to.patch': '75f27987c3f25c501e719bd2b1c70a029ae0ee28514a97fe447516aee02b1535'},
+    {'PyTorch-1.12.1_skip-test_round_robin.patch': '63d4849b78605aa088fdff695637d9473ea60dee603a3ff7f788690d70c55349'},
     {'PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch':
      '5c7be91a6096083a0b1315efe0001537499c600f1f569953c6a2c7f4cc1d0910'},
+    {'PyTorch-1.13.1_fix-protobuf-dependency.patch':
+     '8bd755a0cab7233a243bc65ca57c9630dfccdc9bf8c9792f0de4e07a644fcb00'},
+    {'PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch':
+     'bdde0f2105215c95a54de64ec4b1a4520528510663174fef6d5b900eb1db3937'},
     {'PyTorch-1.13.1_skip-failing-singular-grad-test.patch':
      '72688a57b2bb617665ad1a1d5e362c5111ae912c10936bb38a089c0204729f48'},
+    {'PyTorch-1.13.1_skip-tests-without-fbgemm.patch':
+     '481e595f673baf8ae58b41697a6792b83048b0264aa79b422f48cd8c22948bb7'},
     {'PyTorch-2.0.1_avoid-test_quantization-failures.patch':
      '02e3f47e4ed1d7d6077e26f1ae50073dc2b20426269930b505f4aefe5d2f33cd'},
+    {'PyTorch-2.0.1_fix-skip-decorators.patch': '2039012cef45446065e1a2097839fe20bb29fe3c1dcc926c3695ebf29832e920'},
+    {'PyTorch-2.0.1_fix-vsx-loadu.patch': 'a0ffa61da2d47c6acd09aaf6d4791e527d8919a6f4f1aa7ed38454cdcadb1f72'},
     {'PyTorch-2.0.1_skip-failing-gradtest.patch': '8030bdec6ba49b057ab232d19a7f1a5e542e47e2ec340653a246ec9ed59f8bc1'},
     {'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch':
      '7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae'},
     {'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch':
      '166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5'},
+    {'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch':
+     '3793b4b878be1abe7791efcbd534774b87862cfe7dc4774ca8729b6cabb39e7e'},
+    {'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch':
+     'aef38adf1210d0c5455e91d7c7a9d9e5caad3ae568301e0ba9fc204309438e7b'},
     {'PyTorch-2.1.0_remove-test-requiring-online-access.patch':
      '35184b8c5a1b10f79e511cc25db3b8a5585a5d58b5d1aa25dd3d250200b14fd7'},
+    {'PyTorch-2.1.0_skip-diff-test-on-ppc.patch': '394157dbe565ffcbc1821cd63d05930957412156cc01e949ef3d3524176a1dda'},
     {'PyTorch-2.1.0_skip-dynamo-test_predispatch.patch':
      '6298daf9ddaa8542850eee9ea005f28594ab65b1f87af43d8aeca1579a8c4354'},
+    {'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch':
+     '5229ca88a71db7667a90ddc0b809b2c817698bd6e9c5aaabd73d3173cf9b99fe'},
+    {'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch':
+     '7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb'},
     {'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch':
      'fb96eefabf394617bbb3fbd3a7a7c1aa5991b3836edc2e5d2a30e708bfe49ba1'},
     {'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch':
      '23416f2d9d5226695ec3fbea0671e3650c655c19deefd3f0f8ddab5afa50f485'},
-    {'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch':
-     'ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976'},
+    {'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch':
+     '0dcbdfde6752c3ff54c5376f521b4a742167669feb7f0f1d4e1d4d55f72b664f'},
+    {'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch':
+     '29fb95d1dba070133b513de050febd328ed36905a73f1ca135dc633f16beafa4'},
+    {'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch':
+     '6f8eba5b546129ea975cda1a8a7098ca3245ad2b040a31a98807ee6d69cad0d4'},
+    {'PyTorch-2.3.0_skip-test_init_from_local_shards.patch':
+     '90ed9c2870f57ee6dc032d00873a37e2217a2b92a13035ded1c25ad5306455f2'},
+    {'PyTorch-2.3.0_no-cuda-stubs-rpath.patch': '7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1'},
+    {'PyTorch-2.3.0_disable-gcc12-warning.patch': 'a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514'},
+    {'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch':
+     '36aa2d5ba175be17f4e996f4fb2d544fe477d4a0bd0644cd59a85063779afc8e'},
+    {'PyTorch-2.3.0_fix-test_fine_tuning.patch': 'daa24801f3b2b5f76b639a14fba9a6ad84fe99ebed53401e217d02f94cfe48bf'},
+    {'PyTorch-2.3.0_disable_tests_which_need_network_download.patch':
+     'b7fd1a5135dfd4098cdc054182f7bf84a23ac98462a00477712182b5442da855'},
+    {'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch':
+     '041adcd91d994b8c2ab57d227f081cd57e572c157117b37171e1eb8eb576f8fc'},
+    {'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch':
+     'aa6ff764f3f7bf84372a8a257fe1b4ae6dc4b9744ad35f0f9015f2696c62a41e'},
+    {'PyTorch-2.3.0_relax-test_unbacked_reduction.patch':
+     'c822f084bd97b6c76bea692e3a4664e227b3aea57c80e576a841943877085b77'},
+    {'PyTorch-2.3.0_remove-fsspec-test.patch': '09be192401013cd8cd66add9d6565ac3e879e004d77e61145f826b768267ff61'},
     {'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch':
      '9703fd0f1fca8916f6d79d83e9a7efe8e3f717362a5fdaa8f5d9da90d0c75018'},
-    {'PyTorch-2.6.0_add-checkfunctionexists-include.patch':
-     '93579e35e946fb06025a50c42f3625ed8b8ac9f503a963cc23767e2c8869f0ea'},
-    {'PyTorch-2.6.0_allow-sympy-1.13.3.patch': 'd17f5c528f64fe5e905c9154e90654e8ed2b7f0c16418ffd84ed3913aeb57eea'},
-    {'PyTorch-2.6.0_avoid_caffe2_test_cpp_jit.patch':
-     '88d03d90359bc1fe3cfa3562624d4fbfd4c6654c9199c556ca912ac55289ce55'},
-    {'PyTorch-2.6.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch':
-     '74db866787f1e666ed3b35db5204f05a0ba8d989fb23057a72dd07928388dc46'},
-    {'PyTorch-2.6.0_disable_tests_which_need_network_download.patch':
-     'fe76129811e4eb24d0e12c397335a4c7971b0c4e48ce9cdb9169f3ef9de7aac4'},
-    {'PyTorch-2.6.0_disable-gcc12-warnings.patch': '892643650788b743106ebe4e70c68be42a756eba797f0f79e31708d6e008a620'},
-    {'PyTorch-2.6.0_fix-accuracy-issues-in-linalg_solve.patch':
-     'a6b1cfe8f03ad5b17437e04e6a0369a25fcc79eed939ce6912ceca1c0ab0f444'},
-    {'PyTorch-2.6.0_fix-cpuinfo-bug-with-smt.patch':
-     '2ecb182802e795ed79b7a5f2ce9459780290b4097e981a737a98d4b47d3e2555'},
-    {'PyTorch-2.6.0_fix-distributed-tests-without-gpus.patch':
-     '011cffc098b6818eb160b6bec2e671dec46cb2a8457ce32144ea01cc9ed4290a'},
-    {'PyTorch-2.6.0_fix-edge-case-causing-test_trigger_bisect_on_error-failure.patch':
-     'fd918fa510bf04c95f3bcc2f4abea417632a0fefb278154ec95207ca0d1719ed'},
-    {'PyTorch-2.6.0_fix-ExcTests.test_trigger_on_error.patch':
-     '445472d43a61523b2ed169023f5f6db197bc2df8408f59e6254e55f5cb1d3a11'},
-    {'PyTorch-2.6.0_fix-flaky-test_aot_export_with_torch_cond.patch':
-     '79cf77a795e06c4c3206a998ce8f4a92072f79736803008ede65e5ec2f204bfc'},
-    {'PyTorch-2.6.0_fix-inductor-device-interface.patch':
-     'e8e6af1ea5f01568c23127d4f83aacb482ec9005ba558b68763748a581bcc5bc'},
-    {'PyTorch-2.6.0_fix-server-in-test_control_plane.patch':
-     '1337689ff28ecaa8d1d0edf60d322bcdd7846fec040925325d357b19eb6e4342'},
-    {'PyTorch-2.6.0_fix-skip-decorators.patch': 'ec1ba1ef2a2b2c6753a0b35d10c6af0457fc90fe98e2f77979745d9f79d79c86'},
-    {'PyTorch-2.6.0_fix-sympy-1.13-compat.patch': 'b801690a5b79ba6e4916ac6f719c36682b2a197582aee5e6f385e808f776920e'},
-    {'PyTorch-2.6.0_fix-test_autograd_cpp_node_saved_float.patch':
-     '928c4b1dc16f3d4a7bec29d8749b89ebd41488845938e2514c7fa8c048950e33'},
-    {'PyTorch-2.6.0_fix-test_linear_with_embedding.patch':
-     '56c053de7cfaa2f9898c3b036a185b499f5d44a7b4cd0442c45a8c94928322bf'},
-    {'PyTorch-2.6.0_fix-test_linear_with_in_out_buffer-without-mkl.patch':
-     '8cf9e5d434eb8d3b81400622ca23714c7002a0b835e7e08b384b84408c7ed085'},
-    {'PyTorch-2.6.0_fix-test_public_bindings.patch':
-     '066d88acd8156ed3f91b6a8e924de57f8aef944aa1bf67dc453b830ee1c26094'},
-    {'PyTorch-2.6.0_fix-test_unbacked_bindings_for_divisible_u_symint.patch':
-     '5f5ce1e275888cd6a057a0769fffaa9e49dde003ba191fd70b0265d8c6259a9b'},
-    {'PyTorch-2.6.0_fix-vsx-vector-shift-functions.patch':
-     '82ce0b48e3b7c3dfd3a2ba915f4675d5c3a6d149646e1e0d6a29eedbbaecc8bd'},
-    {'PyTorch-2.6.0_fix-xnnpack-float16-convert.patch':
-     'a6fcb475040c6fed2c0ec8b3f9c1e9fb964220413e84c8f2ee4092770ee6ac7d'},
-    {'PyTorch-2.6.0_increase-tolerance-test_aotdispatch-matmul.patch':
-     'c1c6ea41504e4479d258225ecefc7e9c5726934601610904ae555501a11e9109'},
-    {'PyTorch-2.6.0_increase-tolerance-test_quick-baddbmm.patch':
-     '9850facdfb5d98451249570788217ede07466cae9ba52cd03afd3ec803ba33c9'},
-    {'PyTorch-2.6.0_increase-tolerance-test_vmap_autograd_grad.patch':
-     '8d5eb53bb0a1456af333ae646c860033d6dd037bd9152601a200ca5c10ebf3cb'},
-    {'PyTorch-2.6.0_remove-test_slice_with_floordiv.patch':
-     '1b7ff59a595b9ebbc042d8ff53e3f6c72a1d3b04fb82228f4433473f28623f9b'},
+    {'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch':
+     '7955f2655db3da18606574fdcbc5990be24098f49ad1db5e86ea756ea1cc506f'},
+    {'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch':
+     'ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976'},
+    {'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch':
+     '6205d8249e7edcce5756e073ab0b11a0496da34eec1a55e3d24437a530d2886b'},
     {'PyTorch-2.6.0_show-test-duration.patch': '5508f2f9619204d9f3c356dbd4000a00d58f452ab2d64ae920eb8bc8b5484d75'},
-    {'PyTorch-2.6.0_skip-diff-test-on-ppc.patch': '6f2f87cad1b0ab8c5a0c7b3f7fbc14e4bdfbe61da26a3934ded9dda7fe368c74'},
-    {'PyTorch-2.6.0_skip-test_checkpoint_wrapper_parity-on-cpu.patch':
-     '600f74de167b6fea4d849229de6d653dc616093b456962729222d6bfa767a8e8'},
-    {'PyTorch-2.6.0_skip-test_init_from_local_shards.patch':
-     '222383195f6a3b7c545ffeadb4dd469b9f3361b42c0866de3d3f0f91f8fbe777'},
-    {'PyTorch-2.6.0_skip-test_jvp_linalg_det_singular.patch':
-     '3bbe8e585765d6db2a77ed0f751eadf924fbbedc95bbd88f447538ceede273fd'},
-    {'PyTorch-2.6.0_skip-test-requiring-MKL.patch':
-     'f1c9b1c77b09d59317fd52d390e7d948a147325b927ad6373c1fa1d1d6ea1ea8'},
-    {'PyTorch-2.6.0_skip-test_segfault.patch': '26806bd62e6b61b56ebaa52d68ca44c415a28124f684bd2fb373557ada68ef52'},
-    {'PyTorch-2.6.0_skip-tests-without-fbgemm.patch':
-     'ed35099de94a14322a879066da048ec9bc565dc81287b4adc4fec46f9afe90cf'},
+    {'PyTorch-2.7.1_suport-64bit-BARs.patch': '317c3d220aa87426d86e137a6c1a8f910adf9580ca0848371e0f6800c05dbde1'},
 ]
 
 osdependencies = [OS_PKG_IBVERBS_DEV]
 
 builddependencies = [
-    ('CMake', '3.29.3'),
-    ('hypothesis', '6.103.1'),
+    ('CMake', '3.27.6'),
+    ('hypothesis', '6.90.0'),
     # For tests
-    ('parameterized', '0.9.0'),
     ('pytest-flakefinder', '1.1.0'),
-    ('pytest-rerunfailures', '15.0'),
+    ('pytest-rerunfailures', '14.0'),
     ('pytest-shard', '0.1.2'),
-    ('pytest-subtests', '0.13.1'),
-    ('tlparse', '0.3.37'),
-    ('optree', '0.14.1'),
+    ('tlparse', '0.3.5'),
+    ('optree', '0.13.0'),
     ('unittest-xml-reporting', '3.1.0'),
 ]
 
 dependencies = [
-    ('Ninja', '1.12.1'),  # Required for JIT compilation of C++ extensions
-    ('Python', '3.12.3'),
-    ('Python-bundle-PyPI', '2024.06'),
-    ('protobuf', '28.0'),
-    ('protobuf-python', '5.28.0'),
-    ('pybind11', '2.12.0'),
-    ('PuLP', '2.8.0'),
-    ('SciPy-bundle', '2024.05'),
-    ('PyYAML', '6.0.2'),
+    ('CUDA', '12.4.0', '', SYSTEM),
+    ('cuDNN', '9.0.0.312', versionsuffix, SYSTEM),
+    ('magma', '2.7.2', versionsuffix),
+    ('NCCL', '2.20.5', versionsuffix),
+    # Version from .ci/docker/triton_version.txt
+    ('Triton', '2.3.1', versionsuffix),
+    ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
+    ('Python', '3.11.5'),
+    ('Python-bundle-PyPI', '2023.10'),
+    ('protobuf', '25.3'),
+    ('protobuf-python', '4.25.3'),
+    ('pybind11', '2.11.1'),
+    ('SciPy-bundle', '2023.11'),
+    ('PyYAML', '6.0.1'),
     ('MPFR', '4.2.1'),
     ('GMP', '6.3.0'),
-    ('numactl', '2.0.18'),
-    ('FFmpeg', '7.0.2'),
-    ('Pillow', '10.4.0'),
+    ('numactl', '2.0.16'),
+    ('FFmpeg', '6.0'),
+    ('Pillow', '10.2.0'),
     ('expecttest', '0.2.1'),
-    ('networkx', '3.4.2'),
-    ('sympy', '1.13.3'),
+    ('networkx', '3.2.1'),
+    ('sympy', '1.12'),
     ('Z3', '4.13.0',),
 ]
 
@@ -198,14 +181,34 @@ excluded_tests = {
     '': [
         # This test seems to take too long on NVIDIA Ampere at least.
         'distributed/test_distributed_spawn',
+        # Broken on CUDA 11.6/11.7: https://github.com/pytorch/pytorch/issues/75375
+        'distributions/test_constraints',
         # no xdoctest
         'doctests',
+        # failing on broadwell
+        # See https://github.com/easybuilders/easybuild-easyconfigs/issues/17712
+        'test_native_mha',
         # intermittent failures on various systems
         # See https://github.com/easybuilders/easybuild-easyconfigs/issues/17712
         'distributed/rpc/test_tensorpipe_agent',
         # This test is expected to fail when run in their CI, but won't in our case.
         # It just checks for a "CI" env variable
         'test_ci_sanity_check_fail',
+        # This fails consistently and is disabled upstream
+        # See https://github.com/pytorch/pytorch/issues/100152 and
+        # https://github.com/pytorch/pytorch/pull/124712
+        'test_cpp_extensions_open_device_registration',
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/pull/124786
+        'distributed/checkpoint/test_save_load_api',
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/issues/122184
+        'distributed/tensor/parallel/test_tp_random_state',
+        # Doesn't find "dist.all_reduce(" in generated code. Known failures, e.g.
+        # https://github.com/pytorch/pytorch/issues/121195
+        'distributed/test_compute_comm_reordering',
+        # Long tests, tested successfully once during creation of EC
+        'inductor/test_aot_inductor',  # ~65min
+        'distributed/fsdp/test_fsdp_state_dict',  # ~202min
+        'distributed/fsdp/test_fsdp_core',  # ~88min
     ]
 }
 
Diff against PyTorch-2.3.0-foss-2023b.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
index 2b47bc81b4..bb1f766f48 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
@@ -1,5 +1,6 @@
 name = 'PyTorch'
 version = '2.3.0'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
@@ -37,16 +38,23 @@ patches = [
     'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch',
     'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch',
     'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch',
+    'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch',
     'PyTorch-2.3.0_skip-test_init_from_local_shards.patch',
     'PyTorch-2.3.0_no-cuda-stubs-rpath.patch',
     'PyTorch-2.3.0_disable-gcc12-warning.patch',
     'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch',
+    'PyTorch-2.3.0_fix-test_fine_tuning.patch',
     'PyTorch-2.3.0_disable_tests_which_need_network_download.patch',
     'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch',
     'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch',
+    'PyTorch-2.3.0_relax-test_unbacked_reduction.patch',
+    'PyTorch-2.3.0_remove-fsspec-test.patch',
     'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch',
     'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch',
     'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch',
+    'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch',
+    'PyTorch-2.6.0_show-test-duration.patch',
+    'PyTorch-2.7.1_suport-64bit-BARs.patch',
 ]
 checksums = [
     {'pytorch-v2.3.0.tar.gz': '69579513b26261bbab32e13b7efc99ad287fcf3103087f2d4fdf1adacd25316f'},
@@ -97,26 +105,34 @@ checksums = [
      '0dcbdfde6752c3ff54c5376f521b4a742167669feb7f0f1d4e1d4d55f72b664f'},
     {'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch':
      '29fb95d1dba070133b513de050febd328ed36905a73f1ca135dc633f16beafa4'},
+    {'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch':
+     '6f8eba5b546129ea975cda1a8a7098ca3245ad2b040a31a98807ee6d69cad0d4'},
     {'PyTorch-2.3.0_skip-test_init_from_local_shards.patch':
      '90ed9c2870f57ee6dc032d00873a37e2217a2b92a13035ded1c25ad5306455f2'},
-    {'PyTorch-2.3.0_no-cuda-stubs-rpath.patch':
-     '7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1'},
-    {'PyTorch-2.3.0_disable-gcc12-warning.patch':
-     'a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514'},
+    {'PyTorch-2.3.0_no-cuda-stubs-rpath.patch': '7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1'},
+    {'PyTorch-2.3.0_disable-gcc12-warning.patch': 'a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514'},
     {'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch':
      '36aa2d5ba175be17f4e996f4fb2d544fe477d4a0bd0644cd59a85063779afc8e'},
+    {'PyTorch-2.3.0_fix-test_fine_tuning.patch': 'daa24801f3b2b5f76b639a14fba9a6ad84fe99ebed53401e217d02f94cfe48bf'},
     {'PyTorch-2.3.0_disable_tests_which_need_network_download.patch':
      'b7fd1a5135dfd4098cdc054182f7bf84a23ac98462a00477712182b5442da855'},
     {'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch':
      '041adcd91d994b8c2ab57d227f081cd57e572c157117b37171e1eb8eb576f8fc'},
     {'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch':
      'aa6ff764f3f7bf84372a8a257fe1b4ae6dc4b9744ad35f0f9015f2696c62a41e'},
+    {'PyTorch-2.3.0_relax-test_unbacked_reduction.patch':
+     'c822f084bd97b6c76bea692e3a4664e227b3aea57c80e576a841943877085b77'},
+    {'PyTorch-2.3.0_remove-fsspec-test.patch': '09be192401013cd8cd66add9d6565ac3e879e004d77e61145f826b768267ff61'},
     {'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch':
      '9703fd0f1fca8916f6d79d83e9a7efe8e3f717362a5fdaa8f5d9da90d0c75018'},
     {'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch':
      '7955f2655db3da18606574fdcbc5990be24098f49ad1db5e86ea756ea1cc506f'},
     {'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch':
      'ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976'},
+    {'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch':
+     '6205d8249e7edcce5756e073ab0b11a0496da34eec1a55e3d24437a530d2886b'},
+    {'PyTorch-2.6.0_show-test-duration.patch': '5508f2f9619204d9f3c356dbd4000a00d58f452ab2d64ae920eb8bc8b5484d75'},
+    {'PyTorch-2.7.1_suport-64bit-BARs.patch': '317c3d220aa87426d86e137a6c1a8f910adf9580ca0848371e0f6800c05dbde1'},
 ]
 
 osdependencies = [OS_PKG_IBVERBS_DEV]
@@ -134,6 +150,12 @@ builddependencies = [
 ]
 
 dependencies = [
+    ('CUDA', '12.4.0', '', SYSTEM),
+    ('cuDNN', '9.0.0.312', versionsuffix, SYSTEM),
+    ('magma', '2.7.2', versionsuffix),
+    ('NCCL', '2.20.5', versionsuffix),
+    # Version from .ci/docker/triton_version.txt
+    ('Triton', '2.3.1', versionsuffix),
     ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
     ('Python', '3.11.5'),
     ('Python-bundle-PyPI', '2023.10'),
@@ -176,13 +198,28 @@ excluded_tests = {
         # See https://github.com/pytorch/pytorch/issues/100152 and
         # https://github.com/pytorch/pytorch/pull/124712
         'test_cpp_extensions_open_device_registration',
-
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/pull/124786
+        'distributed/checkpoint/test_save_load_api',
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/issues/122184
+        'distributed/tensor/parallel/test_tp_random_state',
+        # Doesn't find "dist.all_reduce(" in generated code. Known failures, e.g.
+        # https://github.com/pytorch/pytorch/issues/121195
+        'distributed/test_compute_comm_reordering',
+        # Long tests, tested successfully once during creation of EC
+        'inductor/test_aot_inductor',  # ~65min
+        'distributed/fsdp/test_fsdp_state_dict',  # ~202min
+        'distributed/fsdp/test_fsdp_core',  # ~88min
     ]
 }
 
 local_test_opts = '--continue-through-error --pipe-logs --verbose %(excluded_tests)s'
 runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py ' + local_test_opts
 
+# Especially test_quantization has a few corner cases that are triggered by the random input values,
+# those cannot be easily avoided, see https://github.com/pytorch/pytorch/issues/107030
+# So allow a low number of tests to fail as the tests "usually" succeed
+max_failed_tests = 16
+
 tests = ['PyTorch-check-cpp-extension.py']
 
 moduleclass = 'ai'
Diff against PyTorch-2.1.2-foss-2023b.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
index 2206da7c2f..bb1f766f48 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b-CUDA-12.4.0.eb
@@ -1,5 +1,6 @@
 name = 'PyTorch'
-version = '2.1.2'
+version = '2.3.0'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
@@ -11,7 +12,6 @@ source_urls = [GITHUB_RELEASE]
 sources = ['%(namelower)s-v%(version)s.tar.gz']
 patches = [
     'PyTorch-1.7.0_disable-dev-shm-test.patch',
-    'PyTorch-1.11.1_skip-test_init_from_local_shards.patch',
     'PyTorch-1.12.1_add-hypothesis-suppression.patch',
     'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch',
     'PyTorch-1.12.1_fix-TestTorch.test_to.patch',
@@ -23,39 +23,42 @@ patches = [
     'PyTorch-1.13.1_skip-tests-without-fbgemm.patch',
     'PyTorch-2.0.1_avoid-test_quantization-failures.patch',
     'PyTorch-2.0.1_fix-skip-decorators.patch',
-    'PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch',
     'PyTorch-2.0.1_fix-vsx-loadu.patch',
-    'PyTorch-2.0.1_no-cuda-stubs-rpath.patch',
     'PyTorch-2.0.1_skip-failing-gradtest.patch',
     'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch',
     'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch',
-    'PyTorch-2.1.0_disable-gcc12-warning.patch',
-    'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch',
-    'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch',
-    'PyTorch-2.1.0_fix-validationError-output-test.patch',
     'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch',
     'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch',
-    'PyTorch-2.1.0_remove-sparse-csr-nnz-overflow-test.patch',
     'PyTorch-2.1.0_remove-test-requiring-online-access.patch',
     'PyTorch-2.1.0_skip-diff-test-on-ppc.patch',
     'PyTorch-2.1.0_skip-dynamo-test_predispatch.patch',
     'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch',
-    'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch',
-    'PyTorch-2.1.0_skip-test_wrap_bad.patch',
-    'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch',
-    'PyTorch-2.1.2_fix-test_memory_profiler.patch',
-    'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch',
-    'PyTorch-2.1.2_fix-vsx-vector-abs.patch',
-    'PyTorch-2.1.2_fix-vsx-vector-div.patch',
     'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch',
-    'PyTorch-2.1.2_skip-memory-leak-test.patch',
     'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch',
+    'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch',
+    'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch',
+    'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch',
+    'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch',
+    'PyTorch-2.3.0_skip-test_init_from_local_shards.patch',
+    'PyTorch-2.3.0_no-cuda-stubs-rpath.patch',
+    'PyTorch-2.3.0_disable-gcc12-warning.patch',
+    'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch',
+    'PyTorch-2.3.0_fix-test_fine_tuning.patch',
+    'PyTorch-2.3.0_disable_tests_which_need_network_download.patch',
+    'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch',
+    'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch',
+    'PyTorch-2.3.0_relax-test_unbacked_reduction.patch',
+    'PyTorch-2.3.0_remove-fsspec-test.patch',
+    'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch',
+    'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch',
+    'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch',
+    'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch',
+    'PyTorch-2.6.0_show-test-duration.patch',
+    'PyTorch-2.7.1_suport-64bit-BARs.patch',
 ]
 checksums = [
-    {'pytorch-v2.1.2.tar.gz': '85effbcce037bffa290aea775c9a4bad5f769cb229583450c40055501ee1acd7'},
+    {'pytorch-v2.3.0.tar.gz': '69579513b26261bbab32e13b7efc99ad287fcf3103087f2d4fdf1adacd25316f'},
     {'PyTorch-1.7.0_disable-dev-shm-test.patch': '622cb1eaeadc06e13128a862d9946bcc1f1edd3d02b259c56a9aecc4d5406b8a'},
-    {'PyTorch-1.11.1_skip-test_init_from_local_shards.patch':
-     '4aeb1b0bc863d4801b0095cbce69f8794066748f0df27c6aaaf729c5ecba04b7'},
     {'PyTorch-1.12.1_add-hypothesis-suppression.patch':
      'e71ffb94ebe69f580fa70e0de84017058325fdff944866d6bd03463626edc32c'},
     {'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch':
@@ -75,28 +78,16 @@ checksums = [
     {'PyTorch-2.0.1_avoid-test_quantization-failures.patch':
      '02e3f47e4ed1d7d6077e26f1ae50073dc2b20426269930b505f4aefe5d2f33cd'},
     {'PyTorch-2.0.1_fix-skip-decorators.patch': '2039012cef45446065e1a2097839fe20bb29fe3c1dcc926c3695ebf29832e920'},
-    {'PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch':
-     '1b37194f55ae678f3657b8728dfb896c18ffe8babe90987ce468c4fa9274f357'},
     {'PyTorch-2.0.1_fix-vsx-loadu.patch': 'a0ffa61da2d47c6acd09aaf6d4791e527d8919a6f4f1aa7ed38454cdcadb1f72'},
-    {'PyTorch-2.0.1_no-cuda-stubs-rpath.patch': '8902e58a762240f24cdbf0182e99ccdfc2a93492869352fcb4ca0ec7e407f83a'},
     {'PyTorch-2.0.1_skip-failing-gradtest.patch': '8030bdec6ba49b057ab232d19a7f1a5e542e47e2ec340653a246ec9ed59f8bc1'},
     {'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch':
      '7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae'},
     {'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch':
      '166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5'},
-    {'PyTorch-2.1.0_disable-gcc12-warning.patch': 'c858b8db0010f41005dc06f9a50768d0d3dc2d2d499ccbdd5faf8a518869a421'},
-    {'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch':
-     'b15b1291a3c37bf6a4982cfbb3483f693acb46a67bc0912b383fd98baf540ccf'},
-    {'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch':
-     '84bb51a719abc677031a7a3dfe4382ff098b0cbd8b39b8bed2a7fa03f80ac1e9'},
-    {'PyTorch-2.1.0_fix-validationError-output-test.patch':
-     '7eba0942afb121ed92fac30d1529447d892a89eb3d53c565f8e9d480e95f692b'},
     {'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch':
      '3793b4b878be1abe7791efcbd534774b87862cfe7dc4774ca8729b6cabb39e7e'},
     {'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch':
      'aef38adf1210d0c5455e91d7c7a9d9e5caad3ae568301e0ba9fc204309438e7b'},
-    {'PyTorch-2.1.0_remove-sparse-csr-nnz-overflow-test.patch':
-     '0ac36411e76506b3354c85a8a1260987f66af947ee52ffc64230aee1fa02ea8b'},
     {'PyTorch-2.1.0_remove-test-requiring-online-access.patch':
      '35184b8c5a1b10f79e511cc25db3b8a5585a5d58b5d1aa25dd3d250200b14fd7'},
     {'PyTorch-2.1.0_skip-diff-test-on-ppc.patch': '394157dbe565ffcbc1821cd63d05930957412156cc01e949ef3d3524176a1dda'},
@@ -104,22 +95,44 @@ checksums = [
      '6298daf9ddaa8542850eee9ea005f28594ab65b1f87af43d8aeca1579a8c4354'},
     {'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch':
      '5229ca88a71db7667a90ddc0b809b2c817698bd6e9c5aaabd73d3173cf9b99fe'},
-    {'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch':
-     '5dcc79883b6e3ec0a281a8e110db5e0a5880de843bb05653589891f16473ead5'},
-    {'PyTorch-2.1.0_skip-test_wrap_bad.patch': 'b8583125ee94e553b6f77c4ab4bfa812b89416175dc7e9b7390919f3b485cb63'},
-    {'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch':
-     'cd1455495886a7d6b2d30d48736eb0103fded21e2e36de6baac719b9c52a1c92'},
-    {'PyTorch-2.1.2_fix-test_memory_profiler.patch':
-     '30b0c9355636c0ab3dedae02399789053825dc3835b4d7dac6e696767772b1ce'},
-    {'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch':
-     'a0ef99192ee2ad1509c78a8377023d5be2b5fddb16f84063b7c9a0b53d979090'},
-    {'PyTorch-2.1.2_fix-vsx-vector-abs.patch': 'd67d32407faed7dc1dbab4bba0e2f7de36c3db04560ced35c94caf8d84ade886'},
-    {'PyTorch-2.1.2_fix-vsx-vector-div.patch': '11f497a6892eb49b249a15320e4218e0d7ac8ae4ce67de39e4a018a064ca1acc'},
     {'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch':
      '7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb'},
-    {'PyTorch-2.1.2_skip-memory-leak-test.patch': '8d9841208e8a00a498295018aead380c360cf56e500ef23ca740adb5b36de142'},
     {'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch':
      'fb96eefabf394617bbb3fbd3a7a7c1aa5991b3836edc2e5d2a30e708bfe49ba1'},
+    {'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch':
+     '23416f2d9d5226695ec3fbea0671e3650c655c19deefd3f0f8ddab5afa50f485'},
+    {'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch':
+     '0dcbdfde6752c3ff54c5376f521b4a742167669feb7f0f1d4e1d4d55f72b664f'},
+    {'PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch':
+     '29fb95d1dba070133b513de050febd328ed36905a73f1ca135dc633f16beafa4'},
+    {'PyTorch-2.3.0_increase-tolerance-test_jit-test_freeze_conv_relu_fusion.patch':
+     '6f8eba5b546129ea975cda1a8a7098ca3245ad2b040a31a98807ee6d69cad0d4'},
+    {'PyTorch-2.3.0_skip-test_init_from_local_shards.patch':
+     '90ed9c2870f57ee6dc032d00873a37e2217a2b92a13035ded1c25ad5306455f2'},
+    {'PyTorch-2.3.0_no-cuda-stubs-rpath.patch': '7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1'},
+    {'PyTorch-2.3.0_disable-gcc12-warning.patch': 'a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514'},
+    {'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch':
+     '36aa2d5ba175be17f4e996f4fb2d544fe477d4a0bd0644cd59a85063779afc8e'},
+    {'PyTorch-2.3.0_fix-test_fine_tuning.patch': 'daa24801f3b2b5f76b639a14fba9a6ad84fe99ebed53401e217d02f94cfe48bf'},
+    {'PyTorch-2.3.0_disable_tests_which_need_network_download.patch':
+     'b7fd1a5135dfd4098cdc054182f7bf84a23ac98462a00477712182b5442da855'},
+    {'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch':
+     '041adcd91d994b8c2ab57d227f081cd57e572c157117b37171e1eb8eb576f8fc'},
+    {'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch':
+     'aa6ff764f3f7bf84372a8a257fe1b4ae6dc4b9744ad35f0f9015f2696c62a41e'},
+    {'PyTorch-2.3.0_relax-test_unbacked_reduction.patch':
+     'c822f084bd97b6c76bea692e3a4664e227b3aea57c80e576a841943877085b77'},
+    {'PyTorch-2.3.0_remove-fsspec-test.patch': '09be192401013cd8cd66add9d6565ac3e879e004d77e61145f826b768267ff61'},
+    {'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch':
+     '9703fd0f1fca8916f6d79d83e9a7efe8e3f717362a5fdaa8f5d9da90d0c75018'},
+    {'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch':
+     '7955f2655db3da18606574fdcbc5990be24098f49ad1db5e86ea756ea1cc506f'},
+    {'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch':
+     'ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976'},
+    {'PyTorch-2.3.0_fix-unboxing-template-CUDA-12.4.patch':
+     '6205d8249e7edcce5756e073ab0b11a0496da34eec1a55e3d24437a530d2886b'},
+    {'PyTorch-2.6.0_show-test-duration.patch': '5508f2f9619204d9f3c356dbd4000a00d58f452ab2d64ae920eb8bc8b5484d75'},
+    {'PyTorch-2.7.1_suport-64bit-BARs.patch': '317c3d220aa87426d86e137a6c1a8f910adf9580ca0848371e0f6800c05dbde1'},
 ]
 
 osdependencies = [OS_PKG_IBVERBS_DEV]
@@ -131,9 +144,18 @@ builddependencies = [
     ('pytest-flakefinder', '1.1.0'),
     ('pytest-rerunfailures', '14.0'),
     ('pytest-shard', '0.1.2'),
+    ('tlparse', '0.3.5'),
+    ('optree', '0.13.0'),
+    ('unittest-xml-reporting', '3.1.0'),
 ]
 
 dependencies = [
+    ('CUDA', '12.4.0', '', SYSTEM),
+    ('cuDNN', '9.0.0.312', versionsuffix, SYSTEM),
+    ('magma', '2.7.2', versionsuffix),
+    ('NCCL', '2.20.5', versionsuffix),
+    # Version from .ci/docker/triton_version.txt
+    ('Triton', '2.3.1', versionsuffix),
     ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
     ('Python', '3.11.5'),
     ('Python-bundle-PyPI', '2023.10'),
@@ -169,10 +191,34 @@ excluded_tests = {
         # intermittent failures on various systems
         # See https://github.com/easybuilders/easybuild-easyconfigs/issues/17712
         'distributed/rpc/test_tensorpipe_agent',
+        # This test is expected to fail when run in their CI, but won't in our case.
+        # It just checks for a "CI" env variable
+        'test_ci_sanity_check_fail',
+        # This fails consistently and is disabled upstream
+        # See https://github.com/pytorch/pytorch/issues/100152 and
+        # https://github.com/pytorch/pytorch/pull/124712
+        'test_cpp_extensions_open_device_registration',
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/pull/124786
+        'distributed/checkpoint/test_save_load_api',
+        # Test broken until 2.4: https://github.com/pytorch/pytorch/issues/122184
+        'distributed/tensor/parallel/test_tp_random_state',
+        # Doesn't find "dist.all_reduce(" in generated code. Known failures, e.g.
+        # https://github.com/pytorch/pytorch/issues/121195
+        'distributed/test_compute_comm_reordering',
+        # Long tests, tested successfully once during creation of EC
+        'inductor/test_aot_inductor',  # ~65min
+        'distributed/fsdp/test_fsdp_state_dict',  # ~202min
+        'distributed/fsdp/test_fsdp_core',  # ~88min
     ]
 }
 
-runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py --continue-through-error  --verbose %(excluded_tests)s'
+local_test_opts = '--continue-through-error --pipe-logs --verbose %(excluded_tests)s'
+runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py ' + local_test_opts
+
+# Especially test_quantization has a few corner cases that are triggered by the random input values,
+# those cannot be easily avoided, see https://github.com/pytorch/pytorch/issues/107030
+# So allow a low number of tests to fail as the tests "usually" succeed
+max_failed_tests = 16
 
 tests = ['PyTorch-check-cpp-extension.py']
 

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
c41 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/43b2f5c13d49c46a10f70213bbf71513 for a full test report.

@Thyre Thyre added the 2023b label Aug 4, 2025
@verdurin
Copy link
Member

verdurin commented Aug 5, 2025

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@verdurin: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23553 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23553 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7494

Test results coming soon (I hope)...

Details

- notification for comment with ID 3154777311 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 575.57.08, Python 3.9.21
See https://gist.github.com/boegelbot/c99d92e66ed43042fbc78e5bb463727c for a full test report.

@Flamefire Flamefire closed this Aug 5, 2025
@Flamefire Flamefire reopened this Aug 5, 2025
@Flamefire
Copy link
Contributor Author

@verdurin I have Triton also in a separate PR: #23119
It also requires an update to the CMake easyblock. This might fix the issue (hard to tell from the limited output)

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
c85 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/17d7e5a839db5e7348dc880adfb7df9d for a full test report.

@boegel boegel added this to the 5.x milestone Aug 12, 2025
@boegel
Copy link
Member

boegel commented Aug 12, 2025

@Flamefire needs a sync with develop since #23119 is merged?

@boegel
Copy link
Member

boegel commented Aug 19, 2025

Test report by @boegel
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3305.joltik.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 570.133.20, Python 3.9.18
See https://gist.github.com/boegel/184cb665e830ea752c41f2ab9f72c6ef for a full test report.

@Flamefire
Copy link
Contributor Author

@boegel Try with easybuilders/easybuild-easyblocks#3803 and maybe get that merged soon please

@boegel
Copy link
Member

boegel commented Sep 10, 2025

@boegel Try with easybuilders/easybuild-easyblocks#3803 and maybe get that merged soon please

I've kickstarted another test with the updated easyblock, result should pop up in ~22 hours...

@boegel
Copy link
Member

boegel commented Sep 11, 2025

Test report by @boegel
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3803
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3307.joltik.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 570.133.20, Python 3.9.18
See https://gist.github.com/boegel/2e636f5f68248b8a5e4174386ce412a7 for a full test report.

@Flamefire
Copy link
Contributor Author

ping

@boegel
Copy link
Member

boegel commented Jan 28, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23553 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23553 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9511

Test results coming soon (I hope)...

Details

- notification for comment with ID 3809886012 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Jan 30, 2026

Test report by @boegel
SUCCESS
Build succeeded for 25 out of 25 (total: 18 hours 0 mins 16 secs) (2 easyconfigs in total)
node3904.accelgor.os - Linux RHEL 9.6, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 590.48.01, Python 3.9.21
See https://gist.github.com/boegel/6d2a28e0979ecd1b023eafd764eacd20 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (total: 21 hours 16 mins 21 secs) (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.44.01, Python 3.9.23
See https://gist.github.com/boegelbot/a18c40c7c422b3e61a1b373d4afca576 for a full test report.

@boegel boegel modified the milestones: 5.x, next release (5.2.1?) Feb 11, 2026
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Feb 11, 2026

Going in, thanks @Flamefire!

@boegel boegel merged commit ae22b48 into easybuilders:develop Feb 11, 2026
8 checks passed
@Flamefire Flamefire deleted the 20250731181938_new_pr_PyTorch230 branch February 12, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants