Skip to content

Fix warnings with -O3 in jax-0.6.2/CUDA#25208

Merged
branfosj merged 1 commit intoeasybuilders:developfrom
Flamefire:20260204105319_new_pr_jax062
Feb 6, 2026
Merged

Fix warnings with -O3 in jax-0.6.2/CUDA#25208
branfosj merged 1 commit intoeasybuilders:developfrom
Flamefire:20260204105319_new_pr_jax062

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

(created using eb --new-pr)

@github-actions github-actions bot added 2024a issues & PRs related to 2024a common toolchains change labels Feb 4, 2026
@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Feb 4, 2026

Test report by @jfgrimm
FAILED
Build succeeded for 0 out of 1 (total: 33 mins 23 secs) (1 easyconfigs in total)
gpu23.viking2.yor.alces.network - Linux Rocky Linux 8.10, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA H100 PCIe, 580.105.08, Python 3.6.8
See https://gist.github.com/jfgrimm/92cd63089b5e2378ab3da7605ce5bf8e for a full test report.

@branfosj
Copy link
Copy Markdown
Member

branfosj commented Feb 4, 2026

Test report by @branfosj
FAILED
Build succeeded for 4 out of 5 (total: 4 hours 40 mins 54 secs) (1 easyconfigs in total)
bear-pg0208u35a - Linux RHEL 8.10, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-40GB, 580.95.05, Python 3.6.8
See https://gist.github.com/branfosj/e4a7e70447a25adb489700c9781f67bc for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (total: 7 hours 32 mins 26 secs) (1 easyconfigs in total)
i8021 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 580.65.06, Python 3.9.21
See https://gist.github.com/Flamefire/e942c0d14d11b62102e3331cd615b10a for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Both builds fail with

external/xla/xla/tsl/cuda/cupti_stub.cc:16:10: fatal error: 'third_party/gpus/cuda/extras/CUPTI/include/cupti.h' file not found
   16 | #include "third_party/gpus/cuda/extras/CUPTI/include/cupti.h"

I can't tell why that happens at all as I don't see that in my build.
And it can surely not be caused by this change to compiler warnings.
Any ideas?

@branfosj
Copy link
Copy Markdown
Member

branfosj commented Feb 5, 2026

I am about to rebuild CUDA, as my install predates easybuilders/easybuild-easyblocks#3791 and try building jax again.

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Feb 5, 2026

Test report by @jfgrimm
FAILED
Build succeeded for 0 out of 1 (total: 1 hour 8 mins 24 secs) (1 easyconfigs in total)
node001.viking2.yor.alces.network - Linux Rocky Linux 8.10, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/81372dea0644985e5f3892b77e5ce4e3 for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

@jfgrimm Same issue.

In one environment where it did fail for me rebuilding CUDA-12.6.0.eb with easybuilders/easybuild-easyblocks#3791 made it pass afterwards

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Feb 5, 2026

I thought I did, but I accidentally rebuilt CUDA 11.6 not 12.6 🤦

@branfosj
Copy link
Copy Markdown
Member

branfosj commented Feb 5, 2026

Test report by @branfosj
FAILED
Build succeeded for 0 out of 1 (total: 3 mins 46 secs) (1 easyconfigs in total)
bear-pg0210u03a - Linux RHEL 8.10, x86_64, Intel(R) Xeon(R) Platinum 8480CL (sapphirerapids), Python 3.6.8
See https://gist.github.com/branfosj/90e7a15a56880334f087acbb30b1d36f for a full test report.

Clang / LLVM bug on SapphireRapids:

UNREACHABLE executed at /dev/shm/branfosj/build-up-EL8/Clang/18.1.8/GCCcore-13.3.0-CUDA-12.6.0/llvm-project-18.1.8.src/llvm/lib/Target/X86/X86InstrInfo.cpp:8220!

@branfosj
Copy link
Copy Markdown
Member

branfosj commented Feb 5, 2026

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (total: 6 hours 9 mins 44 secs) (1 easyconfigs in total)
bear-pg0208u35a - Linux RHEL 8.10, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-40GB, 580.95.05, Python 3.6.8
See https://gist.github.com/branfosj/6867b336f15732c011c21687462e30a2 for a full test report.

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Feb 5, 2026

Test report by @jfgrimm
SUCCESS
Build succeeded for 1 out of 1 (total: 3 hours 39 mins 52 secs) (1 easyconfigs in total)
gpu22.viking2.yor.alces.network - Linux Rocky Linux 8.10, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA H100 PCIe, 580.105.08, Python 3.6.8
See https://gist.github.com/jfgrimm/6c13860df0909435900eb99d643b3195 for a full test report.

@branfosj branfosj added this to the next release (5.2.1?) milestone Feb 6, 2026
@branfosj
Copy link
Copy Markdown
Member

branfosj commented Feb 6, 2026

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 5ba6b01 into easybuilders:develop Feb 6, 2026
8 checks passed
@Flamefire Flamefire deleted the 20260204105319_new_pr_jax062 branch February 6, 2026 11:34
@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (total: 3 hours 8 mins 19 secs) (1 easyconfigs in total)
c112 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 580.65.06, Python 3.9.21
See https://gist.github.com/Flamefire/296cd5024168a5d4a43db4a2408df200 for a full test report.

@boegel boegel added bug fix and removed change labels Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2024a issues & PRs related to 2024a common toolchains bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants