Skip to content

enhance custom easyblock for LLVM to run tests in parallel#3875

Merged
boegel merged 2 commits intoeasybuilders:developfrom
Flamefire:20250808123536_new_pr_llvm
Aug 12, 2025
Merged

enhance custom easyblock for LLVM to run tests in parallel#3875
boegel merged 2 commits intoeasybuilders:developfrom
Flamefire:20250808123536_new_pr_llvm

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 8, 2025

(created using eb --new-pr)

We were currently running the test step (make check-all) sequentially (-j1) because there were concurrency issues caused by how LLVM uses CMake together with a particularity of CMake "fixed" in 3.19 guarded by a policy which defaults to OLD (non-fixed behavior)

I added patches to the CMakeLists for all LLVM versions:

  • Old versions just get code added that sets the policy to NEW if it exists, no-op for CMake < 3.19
  • LLVM 15 introduced code to set the policy to OLD, which I change to NEW
  • LLVM 16 Moved this code to a new file
    --> 15 & 16+ can be handled the same just with different paths
  • LLVM 20 removed this setting so it is set to NEW as the minimum CMake version required is 3.20

I intentionally set it for all LLVM versions to ease testing. With CMake < 3.19 it won't have any effect and tests will be run serially as before

@boegel
Copy link
Member

boegel commented Aug 8, 2025

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-14.0.6-GCCcore-13.2.0-llvmlite.eb
  • SUCCESS LLVM-16.0.6-GCCcore-13.2.0.eb
  • SUCCESS LLVM-19.1.7-GCCcore-13.3.0.eb
  • SUCCESS LLVM-20.1.5-GCCcore-13.3.0.eb

Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3534.doduo.os - Linux RHEL 9.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/boegel/8b7dbaba555cf83ea95daaf0a6617d31 for a full test report.

Copy link
Collaborator

@Thyre Thyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and this is well tested.
I was a bit confused by the regex substitution for LLVM <15, but this looks to be good as well. So just my knowledge is limited here 😅

if(POLICY CMP0114)
  cmake_policy(SET CMP0114 NEW)
endif()
project(LLVM
  VERSION ${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}
  LANGUAGES C CXX ASM)

@Flamefire
Copy link
Contributor Author

Looks good to me, and this is well tested. I was a bit confused by the regex substitution for LLVM <15, but this looks to be good as well. So just my knowledge is limited here 😅

if(POLICY CMP0114)
  cmake_policy(SET CMP0114 NEW)
endif()
project(LLVM
  VERSION ${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}
  LANGUAGES C CXX ASM)

Yes that's the intended result.

I tried to come up with something that would always work so I needed something that is always there so a typo/mistake would be immediately obvious even without a recent enough CMake version used.

@boegel boegel changed the title Run LLVM tests in parallel enhance custom easyblock for LLVM to run tests in parallel Aug 12, 2025
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel merged commit f625f03 into easybuilders:develop Aug 12, 2025
17 checks passed
@Flamefire Flamefire deleted the 20250808123536_new_pr_llvm branch August 13, 2025 07:58
@Crivella
Copy link
Contributor

Crivella commented Oct 8, 2025

Just saw this PR.
How does this play with

parallel = self.cfg.parallel
if not build_option('mpi_tests'):
parallel = 1
lit_args = [f'-j {parallel}']
if self.cfg['debug_tests']:
lit_args += ['-v']
timeout_single = self.cfg['test_suite_timeout_single']
if timeout_single:
lit_args += ['--timeout', str(timeout_single)]
timeout_total = self.cfg['test_suite_timeout_total']
if timeout_total:
lit_args += ['--max-time', str(timeout_total)]
self._cmakeopts['LLVM_LIT_ARGS'] = '"%s"' % ' '.join(lit_args)

We already had parallelization in lit, are we not running parallel ^ 2 like this?
Furthermore, we should also use here a check for mpi_tests

@Flamefire
Copy link
Contributor Author

Should be fine as they are using that upstream too: https://github.com/llvm/llvm-project/blob/100db538565c80164b05b1c3a5bebeaa0e772fc4/offload/cmake/caches/AMDGPUBot.cmake#L18 but can't tell for sure, it is possible that multiple check-all target dependencies do get run in parallel by that.

Are you seeing that in practice? Maybe we should limit this to something low to be safe?

I'm not sure why mpi_tests is used here, it doesn't seem to be the correct option.

@Crivella
Copy link
Contributor

Crivella commented Oct 8, 2025

I'm not sure why mpi_tests is used here, it doesn't seem to be the correct option.

I think i've seen it used also to disable running tests in parallel and not only for proper MPI tests, but it is possible we might not want here.

Does that upstream also run the tests with check-all -j (XXX>1).?

Atleast to my understanding every make target will run lit under the hood and lit will run tests in parallel with its own -j

@Flamefire
Copy link
Contributor Author

Does that upstream also run the tests with check-all -j (XXX>1).?

Atleast to my understanding every make target will run lit under the hood and lit will run tests in parallel with its own -j

They use Ninja by default which IIRC defaults to running in max-parallel, so it likely does run the test targets, hence lit, in parallel unless there are some dependencies among the targets to avoid that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants