Skip to content

Use LLVM easyblock for Clang 18 easyconfigs#23055

Merged
Crivella merged 9 commits intoeasybuilders:developfrom
Flamefire:clang-llvm-18
Aug 4, 2025
Merged

Use LLVM easyblock for Clang 18 easyconfigs#23055
Crivella merged 9 commits intoeasybuilders:developfrom
Flamefire:clang-llvm-18

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jun 6, 2025

Pulled out of #23028

Requires:

Comparing the build with this and previous Clang easyblock shows that this fixes installing files for every single GPU architecture instead of only the intended ones, see easybuilders/easybuild-easyblocks#3755 (comment)

Also enabling testing as suggested in Slack and related issues.

@boegel
Copy link
Member

boegel commented Jul 3, 2025

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--include-easyblocks-from-pr 3755"

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23055 EB_ARGS="--include-easyblocks-from-pr 3755" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23055 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7127

Test results coming soon (I hope)...

Details

- notification for comment with ID 3033481563 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegel
boegel previously approved these changes Jul 3, 2025
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3755
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/4946794639009dea78fac0ef78666115 for a full test report.

@boegel
Copy link
Member

boegel commented Jul 4, 2025

Test report by @boegel
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3755, easybuilders/easybuild-easyblocks#3746
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
node3527.doduo.os - Linux RHEL 9.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/boegel/5c779b5665424a098bff53c60b52b157 for a full test report.

@boegel boegel modified the milestones: 5.1.1, release after 5.1.1 Jul 4, 2025
@Flamefire
Copy link
Contributor Author

@Crivella It does not. It simply downloads the easyblock as-is and includes the path to the downloaded location. So only one of them is used, which one depends on the order we add the new paths.

I don't think merging is even possible: How would you handle conflicts (when you download diffs) and building from a merged PR?

@Crivella
Copy link
Contributor

Crivella commented Jul 4, 2025

Yeah I assumed so, but also deleted the comment as i noticed it was 2 separate ones clang/llvm and not llvm/llvm my bad there

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3825
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
n1266 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.18
See https://gist.github.com/Flamefire/9e4f8344863a69fef0c53b2d1bfa156a for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3825
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
i7184 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/Flamefire/822969cadc704adbe01f255db2851074 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3825
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
n1711 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.18
See https://gist.github.com/Flamefire/9ff51b893436ba678293583f1ac06e4e for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3825
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
c106 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/e01ce4db1580b17b2285d1c3c9c211ec for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3825
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
i7007 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/Flamefire/273c8538a40140ef78b064f719679310 for a full test report.

@Flamefire
Copy link
Contributor Author

The tests fail mostly with:

error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory

# | : CommandLine Error: Option 'debug-counter' registered more than once!

The 2nd seems to be a known issue with Clang 18, the other is weird: Their test tool run.py starts in a clean environment without LD_LIBRARY_PATH.

LLVM 20 works. Looks like it is supposed to add an rpath, but can't find anything relevant in the logs.

Maybe we should keep skipping the tests?

@Crivella
Copy link
Contributor

Crivella commented Jul 9, 2025

We are gonna need the same patches that are applied for LLVM to pass all the tests

@Crivella
Copy link
Contributor

Crivella commented Jul 9, 2025

In particular for the libatomic error

https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LLVM/LLVM-18.1.8_envintest.patch

This is needed to ensure LD_LIBRARY_PATH is passed when building the tests where needed.
This is a problem i had already encountered when building LLVM on the BOT where libatomic is in a non standard location

See

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
n1318 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.18
See https://gist.github.com/Flamefire/1506d8fb6d276c9456d52ef40478e9d3 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
c144 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/0333f41a22364944b71ac2a1a674512c for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
c144 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/378f76254e62642ed760fd86a7771f7c for a full test report.

@Flamefire
Copy link
Contributor Author

Ok, now I can't reproduce the failures anymore.

@Crivella
Copy link
Contributor

Ok, now I can't reproduce the failures anymore.

The sanitizer tests can be flaky... considering you had 11/10 failures with 3 from the sanitizers i am not too surprised

@Flamefire
Copy link
Contributor Author

Flamefire commented Jul 25, 2025

So add "stack-overflow-with-asan.test" to the ignore list and call it a day?

The error from omp_host_pinned_memory is

"CUDA" error: Failure to alloc memory: Error in cuMemAlloc[Host|Managed]: invalid device context
"PluginInterface" error: Failure to allocate device memory: Failed to allocate from device allocator

and offloading/ctor_dtor.cpp just returns 2 without any output.

@Crivella
Copy link
Contributor

Considering in the newer ECs we do skip_sanitizer_tests = True i would probably do it here as well.
I do not think the failing sanitizer tests will always be the same...

EG:

@Crivella
Copy link
Contributor

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr23055"

@boegelbot
Copy link
Collaborator

@Crivella: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23055 EB_ARGS="--installpath /tmp/$USER/pr23055" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23055 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7384

Test results coming soon (I hope)...

Details

- notification for comment with ID 3132130925 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
c113 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/a4f958bcc1649ffb4e5f6b2d824ca051 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
i7009 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.21
See https://gist.github.com/Flamefire/1848dd59efb69ef085de7c1f0c7cfa75 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/1d93f56eedd9fc7067a96c388dc9e053 for a full test report.

@Crivella
Copy link
Contributor

Crivella commented Jul 30, 2025

Umh we might also want to disable the fuzzer tests, when we set skip_sanitizer_tests

https://github.com/llvm/llvm-project/blob/36961202fbf45968cc273fa78fe3479409f5a9c7/compiler-rt/test/CMakeLists.txt#L66-L90

Right now through regex_subs = [(r'compiler_rt_test_runtime.*san.*', '')] we are only removing lsan ubsan sanitizer_common and the ones from the loop

Concerning the failing CUDA tests, not sure how relevant they are or if we want to add them to ignores tests in the EC ...

@Flamefire
Copy link
Contributor Author

The "Failure to alloc memory: " sounds serious but they don't seem to be too consistent, so maybe not. I'm currently doing some rebuilds and will check if they always fail.

I also tried enabling the tests for the original EasyConfig (using Clang easyblock) but there it couldn't even finish compiling. Looks like a concurrency issue.

@Crivella
Copy link
Contributor

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr23055"

@boegelbot
Copy link
Collaborator

@Crivella: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23055 EB_ARGS="--installpath /tmp/$USER/pr23055" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23055 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7427

Test results coming soon (I hope)...

Details

- notification for comment with ID 3139036781 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Contributor Author

I think those failures aren't serious after all: Calling cuMemAlloc before any other CUDA stuff that creates a context is supposed to fail.
So I guess it is a bug in LLVM.

Comparing the EC built with Clang and LLVM and running the simple example:

  1. With Clang the allocation succeeds but the free causes a ""CUDA" error: Failure to free memory: Error in cuMemFree[Host]: invalid argument"
  2. With LLVM I get the same as in the test step ""CUDA" error: Failure to alloc memory: Error in cuMemAlloc[Host|Managed]: invalid device context"

So good to ignore I guess

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
c40 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/099eec666d1403ea002be59606f5f449 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 575.57.08, Python 3.9.21
See https://gist.github.com/boegelbot/eae48834eb468f50e6e1a09f16af9cdc for a full test report.

@Crivella
Copy link
Contributor

Crivella commented Aug 4, 2025

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr23055 Clang-18.1.8-GCCcore-13.3.0.eb python-igraph-0.11.9-foss-2024a.eb PySide6-6.7.2-GCCcore-13.3.0.eb pocl-6.0-GCC-13.3.0.eb"

@Crivella
Copy link
Contributor

Crivella commented Aug 4, 2025

Doing one final test building all the software that currently exists in EB that depends on Clang-18.1.8-GCCcore-13.3.0

@boegelbot
Copy link
Collaborator

@Crivella: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23055 EB_ARGS="--installpath /tmp/$USER/pr23055 Clang-18.1.8-GCCcore-13.3.0.eb python-igraph-0.11.9-foss-2024a.eb PySide6-6.7.2-GCCcore-13.3.0.eb pocl-6.0-GCC-13.3.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23055 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7477

Test results coming soon (I hope)...

Details

- notification for comment with ID 3149606457 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/d5036fea41bb090bdc95c7593cb14399 for a full test report.

Copy link
Contributor

@Crivella Crivella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

passes both the CPU/GPU builds and also tested building software that depends on Clang-18.1.8-GCCcore-13.3.0 both as a builddep and normal dep

@Crivella
Copy link
Contributor

Crivella commented Aug 4, 2025

Going in, thanks @Flamefire!

@Crivella Crivella merged commit e04ee46 into easybuilders:develop Aug 4, 2025
8 checks passed
@Flamefire Flamefire deleted the clang-llvm-18 branch August 4, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants