Skip to content

{compiler}[GCCcore/13.3.0] dpcpp v6.0.0#22418

Merged
Crivella merged 9 commits intoeasybuilders:developfrom
rafbiels:dpcpp-6.0.0
Jul 31, 2025
Merged

{compiler}[GCCcore/13.3.0] dpcpp v6.0.0#22418
Crivella merged 9 commits intoeasybuilders:developfrom
rafbiels:dpcpp-6.0.0

Conversation

@rafbiels
Copy link
Contributor

Add new easyconfig for the open-source DPC++ compiler version 6.0.0 (https://github.com/intel/llvm/releases/tag/v6.0.0). This build supports SYCL compilation for OpenCL, Level Zero and CUDA backends, allowing to run SYCL code on x86 CPUs, Intel GPUs and NVIDIA GPUs.

A patch file and configuration options are in place to ensure compatibility with a wide range of CUDA releases by removing the dependency on CUPTI. This build of DPC++ can be used with any CUDA version starting from 11.7+ or without CUDA at all (to target only OpenCL and Level Zero backends).

Add new easyconfig for the open-source DPC++ compiler version 6.0.0
supporting SYCL compilation for OpenCL, Level Zero and CUDA backends,
allowing to run SYCL code on x86 CPUs, Intel GPUs and NVIDIA GPUs.

A patch file and configuration options are in place to ensure compatibility
with a wide range of CUDA releases by removing the dependency on CUPTI.
This build of DPC++ can be used with any CUDA version starting from 11.7+
or without CUDA at all (to target only OpenCL and Level Zero backends).
@github-actions github-actions bot added the new label Feb 27, 2025
@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

We typically only have a single CUDA version being used per toolchain generation as far as I am aware of. For GCCcore 13.3.0, this normally is CUDA 12.6.0. Personally, I would prefer having a fixed CUDA version over having the tracing feature disabled. Having CUDA as an "optional" dependency (i.e. build dependency) sounds fine to me though.

I'm wondering how much the build parameters created by buildbot/configure.py differ from the ones we pass in our Clang EasyBlock (or the new LLVM one in works)...
That shouldn't be an issue though.

@rafbiels
Copy link
Contributor Author

Thank you for the feedback, these are very good points. I'll try to explain the reasoning behind this setup, please let me know what you think.

We typically only have a single CUDA version being used per toolchain generation as far as I am aware of. For GCCcore 13.3.0, this normally is CUDA 12.6.0. Personally, I would prefer having a fixed CUDA version over having the tracing feature disabled. Having CUDA as an "optional" dependency (i.e. build dependency) sounds fine to me though.

My idea was for this to work on HPC systems with their own system installation of CUDA, regardless of the version, rather than pulling a specific version from EasyBuild (which still also works in this setup). This, in my experience, works best for full support of the matching driver and tools like profilers. Since CUDA is backwards but not forwards compatible, we want to build DPC++ with the oldest CUDA version supported by us (11.7 for this release), and then users can use it with any newer version.

The build is also aimed to be as close as possible to Intel's and Codeplay's binary releases (oneAPI 2025.0.0) while still remaining fully open-source. We actually disable CUPTI tracing in our binary release to allow this portability and advise users that all CUDA versions 11.7+ work with our backend library (see https://developer.codeplay.com/products/oneapi/nvidia/2025.0.0/guides/get-started-guide-nvidia).

I'm wondering how much the build parameters created by buildbot/configure.py differ from the ones we pass in our Clang EasyBlock (or the new LLVM one in works)... That shouldn't be an issue though.

I believe the options are considerably different and using the provided buildbot script makes the easyconfig more concise and easier to maintain. Any option changes between versions won't need to be mapped onto easyconfig and we will be able to reuse the same solution for future versions. Since the DPC++ project is kept in close sync with upstream llvm with regular pull-downs and occasional upstreaming of features, we didn't modify the llvm default options but instead have the buildbot which puts together the DPC++ defaults.

For reference, these are the CMake options created by this easyconfig calling the buildbot:

cmake 
-G Ninja 
-DCMAKE_BUILD_TYPE=Release 
-DLLVM_ENABLE_ASSERTIONS=ON 
-DLLVM_TARGETS_TO_BUILD=host;NVPTX 
-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv;opencl;xpti;xptifw;libdevice;sycl-jit;openmp
-DLLVM_EXTERNAL_SYCL_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/sycl 
-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/llvm-spirv 
-DLLVM_EXTERNAL_XPTI_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/xpti
-DXPTI_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/xpti
-DLLVM_EXTERNAL_XPTIFW_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/xptifw
-DLLVM_EXTERNAL_LIBDEVICE_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/libdevice
-DLLVM_EXTERNAL_SYCL_JIT_SOURCE_DIR=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/sycl-jit
-DLLVM_ENABLE_PROJECTS=clang;sycl;llvm-spirv;opencl;xpti;xptifw;libdevice;sycl-jit;openmp;libclc
-DSYCL_BUILD_PI_HIP_PLATFORM=AMD
-DLLVM_BUILD_TOOLS=ON
-DSYCL_ENABLE_WERROR=OFF
-DCMAKE_INSTALL_PREFIX=${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/build/install
-DSYCL_INCLUDE_TESTS=ON
-DLLVM_ENABLE_DOXYGEN=OFF
-DLLVM_ENABLE_SPHINX=OFF
-DBUILD_SHARED_LIBS=OFF
-DSYCL_ENABLE_XPTI_TRACING=ON
-DLLVM_ENABLE_LLD=OFF
-DXPTI_ENABLE_WERROR=OFF
-DSYCL_CLANG_EXTRA_FLAGS='-DSYCL_ENABLE_PLUGINS=opencl;level_zero;cuda'
-DSYCL_ENABLE_EXTENSION_JIT=ON
-DSYCL_ENABLE_MAJOR_RELEASE_PREVIEW_LIB=ON
-DBUG_REPORT_URL=https://github.com/intel/llvm/issues
-DLIBCLC_TARGETS_TO_BUILD=;nvptx64--nvidiacl
-DLIBCLC_GENERATE_REMANGLED_VARIANTS=ON
-DLIBCLC_NATIVECPU_HOST_TARGET=OFF
-DCMAKE_INSTALL_PREFIX=${HOME}/.local/easybuild/software/dpcpp/6.0.0-GCCcore-13.3.0
-DCUDA_TOOLKIT_ROOT_DIR=${HOME}/.local/easybuild/software/CUDA/11.7.0
-DSYCL_ENABLE_XPTI_TRACING=OFF ${HOME}/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/llvm

@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

Thanks a lot for the extensive comment. This clears things up a lot.

My idea was for this to work on HPC systems with their own system installation of CUDA, regardless of the version, rather than pulling a specific version from EasyBuild (which still also works in this setup). This, in my experience, works best for full support of the matching driver and tools like profilers.

From my experience, users of EasyBuild typically also install CUDA via EasyBuild and don't have an external version lying around. This may be different on HPE Cray machines, I'm not sure. However, I completely understand your reasoning here. Maybe other people can give input here on how we would want to handle this in EasyBuild. For the actual (non Open Source) oneAPI compilers, I always chose a CUDA version close to what is used in the toolchain, see #21582.

We actually disable CUPTI tracing in our binary release to allow this portability and advise users that all CUDA versions 11.7+ work with our backend library [...]

I didn't know that, thanks for the information! This totally makes sense after thinking a bit more about it, especially as only one CUPTI instance can be attached, which could cause issues with other profiling tools like Nsight Systems or Score-P. For profiling / tracing on NVIDIA GPUs, people would probably use these tools anyway, though some SYCL information might get lost I assume.
Generally though, disabling the profiling, and being close to oneAPI, is a good thing.

I believe the options are considerably different and using the provided buildbot script makes the easyconfig more concise and easier to maintain. Any option changes between versions won't need to be mapped onto easyconfig and we will be able to reuse the same solution for future versions.

I agree, the buildbot script defines a lot of things one would need to maintain manually otherwise, e.g. in an EasyBlock.
Maybe @Crivella can give some input here, if we need to define something particular for EasyBuild, as he's working on the new LLVM EasyBlock (see easybuilders/easybuild-easyblocks#3373).


I also started a local build of your PR, and will do some testing once that is finished.
Might take some time to complete though, as LLVM is a monster 😄

@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

Test report by @Thyre
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
Linux - Linux Arch Linux UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 570.86.16, Python 3.13.2
See https://gist.github.com/Thyre/e0b93b46b4ce6b44ae18fda3b6676275 for a full test report.

@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

Test report by @Thyre FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) Linux - Linux Arch Linux UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 570.86.16, Python 3.13.2 See https://gist.github.com/Thyre/e0b93b46b4ce6b44ae18fda3b6676275 for a full test report.

Hm, the build failed with the following error:

6798 [3780/3890] Generating ../../lib/libsycl-fallback-complex.spv
6799 FAILED: lib/libsycl-fallback-complex.spv /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/build/lib/libsycl-fallback-complex.spv 
6800 cd /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/build/tools/libdevice && /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/build/bin/clang-19 -fsycl-device-only -fsycl-device-obj=spirv -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2020 -fno-sycl-libspi     rv -fno-bundle-offload-arch -nocudalib --cuda-gpu-arch=sm_50 /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/libdevice/fallback-complex.cpp -o /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/build/./lib/libsycl-fallback-complex.spv
6801 clang-19: warning: ignoring '-fno-sycl-libspirv' option as it is not currently supported for target 'spir64-unknown-unknown' [-Woption-ignored]
6802 clang-19: warning: argument unused during compilation: '-fno-bundle-offload-arch' [-Wunused-command-line-argument]
6803 clang-19: warning: argument unused during compilation: '--cuda-gpu-arch=sm_50' [-Wunused-command-line-argument]
6804 In file included from /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/libdevice/fallback-complex.cpp:12:
6805 In file included from /data/EasyBuild-develop/build/dpcpp/6.0.0/GCCcore-13.3.0/build/bin/../include/sycl/stl_wrappers/cmath:7:
6806 In file included from /usr/lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/cmath:47:
6807 In file included from /usr/include/math.h:43:
6808 /usr/include/bits/floatn.h:83:52: error: unsupported machine mode '__TC__'
6809    83 | typedef _Complex float __cfloat128 __attribute__ ((__mode__ (__TC__)));
6810       |                                                    ^
6811 1 error generated.

Looks like my glibc is once again causing issues, see also intel/llvm#16903.
Let me check another system.


My system already has the glibc patch mentioned in the issue, but compilation still fails...

@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

Building on a system without a GPU fails shortly after starting to build with:

ninja: Entering directory `/dev/shm/reuter1/dpcpp/6.0.0/GCCcore-13.3.0/build'
[0/2] Re-checking globbed directories...
ninja: error: 'CUDA_CUDA_LIBRARY-NOTFOUND', needed by 'lib/libur_adapter_cuda.so.0.10.8', missing and no known rule to make it

We should probably also set CUDA_CUDA_LIBRARY to use the CUDA stubs in $EBROOTCUDA/lib64/stubs/, as is done in unified-runtime-src for GitHub workflows.

_deps/unified-runtime-src/.github/workflows/coverity.yml:58:          -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so

@rafbiels
Copy link
Contributor Author

Thank you for testing thoroughly! My setup clearly missed those issues.

I haven't yet come back to testing that glibc patch, many thanks for sharing your findings in the ticket. I hope this can be fixed soon as well. Is there any EasyBuild module that provides glibc? Perhaps that could be more reliable (and reproducible!) than taking the system one.

For the libcuda.so (stub) it looks like we should improve the DPC++ CMake config to find it, but your workaround is the right approach for this release. I'll update the config accordingly.

@Thyre
Copy link
Collaborator

Thyre commented Feb 28, 2025

Thank you for testing thoroughly! My setup clearly missed those issues.

No problem 😄

I haven't yet come back to testing that glibc patch, many thanks for sharing your findings in the ticket. I hope this can be fixed soon as well. Is there any EasyBuild module that provides glibc? Perhaps that could be more reliable (and reproducible!) than taking the system one.

As far as I know, this is basically something out of our control. Most host systems only have one glibc installed, and mixing them causes all kinds of issues, especially in regards to missing symbols. There are very old EasyConfigs for some glibc versions, but they haven't been updated in ages. Unfortunately though, my knowledge in glibc is quite limited.

For the libcuda.so (stub) it looks like we should improve the DPC++ CMake config to find it, but your workaround is the right approach for this release. I'll update the config accordingly.

Thanks!


I've also noticed a few (small) things after managing to build the EasyConfig on a system with an older glibc.
The good news: Running SYCL on a CUDA device works. I noticed though, that these lines appear every time I'm running the program (or sycl-ls), maybe related to some config option?:

ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:0 Drivers Discovered

and while the loader finds adapters for Level Zero, OpenCL and CUDA, the x86 one seems to be missing. Maybe because this is an AMD CPU?

<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory

@Crivella
Copy link
Contributor

Crivella commented Mar 3, 2025

@Thyre

I agree, the buildbot script defines a lot of things one would need to maintain manually otherwise, e.g. in an EasyBlock. Maybe @Crivella can give some input here, if we need to define something particular for EasyBuild, as he's working on the new LLVM EasyBlock (see easybuilders/easybuild-easyblocks#3373).

Since the projects mantains their own way of configuring llvm, I would say the only things that would be nice to check are if RPATHing is being properly enforced on all binaries/libraries. I had some trouble with it in the runtimes builds as those are built using the compilers produced in the project stage, which i had to solve in a slightly hackish way easybuilders/easybuild-easyblocks@c2b7c9d.

Guess also checking that --sysroot is properly working would be nice.

I also started a local build of your PR, and will do some testing once that is finished. Might take some time to complete though, as LLVM is a monster 😄

I think my WS has spent more time building LLVM than anything else since its lifetime 😅

@rafbiels

I believe the options are considerably different and using the provided buildbot script makes the easyconfig more concise and easier to maintain.

In general when a build require many options and non trivial logic, those would be implemented in an EasyBlock, which would make only a few configure option relevant for the respective EasyConfig files.
In this case i guess buildbot is doing something similar to what an easyblock would normally do, so no point in duplicating that effort. Though as stated above it would be nice to check if the build is compliant with some of the general options of easybuild like RPATHing (relevant for 5.x as it will be on by default) and setting a different sysroot (relevant for EESSI build).

… command

Add find_library for libcuda.so in the patch, as the logic is missing
in this release. Add --gcc-toolchain config for clang in the same way
as for clang++.
@rafbiels
Copy link
Contributor Author

rafbiels commented Mar 3, 2025

@Thyre I added a find_library call for libcuda.so in the cmake patch rather than specifying in the configure command. For your extra points:

  1. This is a known bug in this version in a dependency project: ZE_LOADER_DEBUG_TRACE always printing for arbitrary SYCL execution oneapi-src/level-zero#201 - unfortunately, there is no patch version for level-zero with just this fix, and pulling a newer tag would pull too many unrelated changes. This is fixed in the next release (hopefully coming soon).
  2. The Native CPU backend is a new feature under development and is not yet feature-complete, so it is disabled in this release (see https://github.com/intel/llvm/blob/sycl/sycl/doc/design/SYCLNativeCPU.md). The x86 support is available through the OpenCL backend. The backend supports x86 CPUs, Intel GPUs and Intel FPGAs, but each of these options has some silent dependencies. For the x86 CPU support you need the proprietary Intel OpenCL runtime library (libintelocl.so) and the Intel TBB library to be findable at runtime. They need to come from another module or from system installation, similarly to the CUDA driver dependency for the CUDA backend.

@Crivella many thanks for your insights! I tested the build with eb --rpath --filter-env-vars=LD_LIBRARY_PATH and the SYCL runtime libs look to be rpath'ed correctly:

$ echo $LD_LIBRARY_PATH

$ readelf -d libsycl.so | grep RPATH
 0x000000000000000f (RPATH)              Library rpath: [/home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib:/home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/home/eb/work/software/hwloc/2.10.0-GCCcore-13.3.0/lib64:/home/eb/work/software/binutils/2.42-GCCcore-13.3.0/lib64:/home/eb/work/software/CUDA/11.7.0/lib64:/home/eb/work/software/Python/3.12.3-GCCcore-13.3.0/lib64:/home/eb/work/software/GCCcore/13.3.0/lib64:/home/eb/work/software/GCCcore/13.3.0/lib:/home/eb/work/software/libpciaccess/0.18.1-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/libxml2/2.12.7-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/numactl/2.0.18-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/libffi/3.4.5-GCCcore-13.3.0/lib64/../lib64:/home/eb/work/software/SQLite/3.45.3-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/Tcl/8.6.14-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/libreadline/8.2-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/libarchive/3.7.4-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/XZ/5.4.5-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/cURL/8.7.1-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/OpenSSL/3/lib/../lib64:/home/eb/work/software/bzip2/1.0.8-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/zlib/1.3.1-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/ncurses/6.5-GCCcore-13.3.0/lib/../lib64:/home/eb/work/software/GCCcore/13.3.0/lib/gcc/x86_64-pc-linux-gnu/13.3.0:/home/eb/work/software/libffi/3.4.5-GCCcore-13.3.0/lib:/home/eb/work/software/libpciaccess/0.18.1-GCCcore-13.3.0/lib:/home/eb/work/software/libxml2/2.12.7-GCCcore-13.3.0/lib:/home/eb/work/software/numactl/2.0.18-GCCcore-13.3.0/lib:/home/eb/work/software/libffi/3.4.5-GCCcore-13.3.0/lib64:/home/eb/work/software/SQLite/3.45.3-GCCcore-13.3.0/lib:/home/eb/work/software/Tcl/8.6.14-GCCcore-13.3.0/lib:/home/eb/work/software/libreadline/8.2-GCCcore-13.3.0/lib:/home/eb/work/software/libarchive/3.7.4-GCCcore-13.3.0/lib:/home/eb/work/software/XZ/5.4.5-GCCcore-13.3.0/lib:/home/eb/work/software/cURL/8.7.1-GCCcore-13.3.0/lib:/home/eb/work/software/OpenSSL/3/lib:/home/eb/work/software/bzip2/1.0.8-GCCcore-13.3.0/lib:/home/eb/work/software/zlib/1.3.1-GCCcore-13.3.0/lib:/home/eb/work/software/ncurses/6.5-GCCcore-13.3.0/lib]

$ ldd libsycl.so
	linux-vdso.so.1 (0x00007ffebfbfb000)
	libur_loader.so.0 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libur_loader.so.0 (0x00007fa6cc200000)
	libgcc_s.so.1 => /home/eb/work/software/GCCcore/13.3.0/lib64/libgcc_s.so.1 (0x00007fa6cd61e000)
	libstdc++.so.6 => /home/eb/work/software/GCCcore/13.3.0/lib64/libstdc++.so.6 (0x00007fa6cbe00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa6cd117000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa6cbbee000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa6cd647000)
	libz.so.1 => /home/eb/work/software/zlib/1.3.1-GCCcore-13.3.0/lib/../lib64/libz.so.1 (0x00007fa6cd5fe000)
$ ldd libur_loader.so
	linux-vdso.so.1 (0x00007fff429da000)
	libz.so.1 => /home/eb/work/software/zlib/1.3.1-GCCcore-13.3.0/lib/../lib64/libz.so.1 (0x00007f2e6094c000)
	libstdc++.so.6 => /home/eb/work/software/GCCcore/13.3.0/lib64/libstdc++.so.6 (0x00007f2e5f600000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2e5f917000)
	libgcc_s.so.1 => /home/eb/work/software/GCCcore/13.3.0/lib64/libgcc_s.so.1 (0x00007f2e60924000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2e5f3ee000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2e6096b000)
$ ldd libur_adapter_level_zero.so
	linux-vdso.so.1 (0x00007ffd046ca000)
	libumf.so.0 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libumf.so.0 (0x00007f80fe85f000)
	libze_loader.so.1 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libze_loader.so.1 (0x00007f80fe792000)
	libstdc++.so.6 => /home/eb/work/software/GCCcore/13.3.0/lib64/libstdc++.so.6 (0x00007f80fe400000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f80fe6a6000)
	libgcc_s.so.1 => /home/eb/work/software/GCCcore/13.3.0/lib64/libgcc_s.so.1 (0x00007f80fe67f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f80fe1ee000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f80fe9bf000)
$ ldd libur_adapter_opencl.so    
	linux-vdso.so.1 (0x00007ffc23bf9000)
	libOpenCL.so.1 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libOpenCL.so.1 (0x00007f0d18419000)
	libumf.so.0 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libumf.so.0 (0x00007f0d18407000)
	libstdc++.so.6 => /home/eb/work/software/GCCcore/13.3.0/lib64/libstdc++.so.6 (0x00007f0d18000000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0d1831b000)
	libgcc_s.so.1 => /home/eb/work/software/GCCcore/13.3.0/lib64/libgcc_s.so.1 (0x00007f0d182f4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0d17dee000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0d18482000)
$ ldd libur_adapter_cuda.so  
	linux-vdso.so.1 (0x00007ffdb368e000)
	libcuda.so.1 => not found
	libumf.so.0 => /home/eb/work/software/dpcpp/6.0.0-GCCcore-13.3.0/lib/libumf.so.0 (0x00007fbbe9f79000)
	libstdc++.so.6 => /home/eb/work/software/GCCcore/13.3.0/lib64/libstdc++.so.6 (0x00007fbbe9c00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbbe9e90000)
	libgcc_s.so.1 => /home/eb/work/software/GCCcore/13.3.0/lib64/libgcc_s.so.1 (0x00007fbbe9e69000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbbe99ee000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbbea003000)

Only libcuda.so is not found, which is correct because I built this in a container without CUDA driver installed and only used the stub library from the toolkit at build time. When using DPC++ compiler and runtime, libcuda.so should come from the system installation of the CUDA driver (if available - otherwise the CUDA backend is just not available, but others still work fine).

Is there anything else you'd recommend to test here? How could I test that --sysroot works correctly?

@Thyre
Copy link
Collaborator

Thyre commented Mar 3, 2025

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22418 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22418 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5850

Test results coming soon (I hope)...

Details

- notification for comment with ID 2695490690 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/f9d6e7d61c2899472831e157f430dce4 for a full test report.

@Thyre
Copy link
Collaborator

Thyre commented Mar 3, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22418 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22418 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5851

Test results coming soon (I hope)...

Details

- notification for comment with ID 2695610334 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/8492c76a3b3115e5580cd19e08e3f5e0 for a full test report.

@Crivella
Copy link
Contributor

Crivella commented Mar 4, 2025

@Crivella many thanks for your insights! I tested the build with eb --rpath --filter-env-vars=LD_LIBRARY_PATH and the SYCL runtime libs look to be rpath'ed correctly:

Have not checked myself, but in case some parts of your build are built as LLVM runtimes (either custom to this project or native LLVM) that is where you should be checking the rpath.
Easybuild when using --rpath will create a wrapper that is invoked instead of the actual compiler commands that takes everything in LD_LIBRARY_PATH at command invocations and changes it to an equivalent series or rpath options.
The problem arises during the build of the runtimes, where by default the CMake logic implemented will use the compilers produced during the project step ad the new compilers (and hence will not go through the wrappers generated by easybuild)

EDIT: To my experience, most if not all the runtime stuff ends up in `<install_dir>/lib/<target_trible>

to give an example you should check libc++.so and libomptarget.so if you are building them

$ ~/.local/easybuild/software/LLVM/19.1.7/lib/x86_64-unknown-linux-gnu$ readelf -d libc++.so.1 | grep rpath
 0x000000000000000f (RPATH)              Library rpath: [/home/crivella/.local/easybuild/software/LLVM/19.1.7/lib:/home/crivella/.local/easybuild/software/LLVM/19.1.7/lib64:/home/crivella/.local/easybuild/software/LLVM/19.1.7/lib/x86_64-unknown-linux-gnu:/home/crivella/.local/easybuild/software/zlib/1.3.1/lib:/home/crivella/.local/easybuild/software/ncurses/6.5/lib:/home/crivella/.local/easybuild/software/Perl/5.38.2-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/gettext/0.22.5-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/libiconv/1.17-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/libxml2/2.12.7-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/expat/2.6.2-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/lit/18.1.8-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/Python/3.12.3-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/psutil/6.0.0-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/libffi/3.4.5-GCCcore-13.3.0/lib64:/home/crivella/.local/easybuild/software/libffi/3.4.5-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/SQLite/3.45.3-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/Tcl/8.6.14-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/libreadline/8.2-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/binutils/2.42-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/libarchive/3.7.4-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/XZ/5.4.5-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/cURL/8.7.1-GCCcore-13.3.0/lib:/home/crivella/.local/easybuild/software/OpenSSL/3/lib:/home/crivella/.local/easybuild/software/bzip2/1.0.8-GCCcore-13.3.0/lib]

Is there anything else you'd recommend to test here? How could I test that --sysroot works correctly?

You can check as a base the way software is built for EESSI or on top of EESSI
https://www.eessi.io/docs/using_eessi/building_on_eessi/

@rafbiels
Copy link
Contributor Author

rafbiels commented Mar 4, 2025

Thank you @Crivella. This build doesn't include libc++ since DPC++ is only tested and shipped with libstdc++ (though of course should work fine with libc++). I'm also only building the host runtime for OpenMP and that (libomp) looks to be RPATHed correctly as well.

I went through the build.ninja configuration and the only files compiled with the freshly-built compiler here are the bitcode libraries, i.e.the device code in .bc files. That includes libclc as well as various SYCL device libraries like libsycl-cmath, libsycl-complex, etc. Since this is all exclusively device code, it is only ever statically linked and there are no dynamic sections in these binaries. I think we're good then.

I also tested the build with EESSI and it succeeded, again with RPATH in runtime libraries looking good.

While testing the EESSI build, which is actually where I'd also like to add DPC++, I realised they don't support the 2024a compiler toolchain yet (GCC-13.3.0). I guess that's coming later this year based on https://gitlab.com/eessi/support/-/issues/56 but likely with glibc 2.41 so we also need to fix intel/llvm#16903 for that. This made me wonder if I should downgrade the toolchain in this PR to GCC 13.2.0, or maybe better to create a second easyconfig with the other version? Do you have any experience / suggestions in this aspect?

@Thyre
Copy link
Collaborator

Thyre commented Mar 4, 2025

This made me wonder if I should downgrade the toolchain in this PR to GCC 13.2.0, or maybe better to create a second easyconfig with the other version? Do you have any experience / suggestions in this aspect?

Having the same software version in multiple toolchains is typically not a problem, as long as only a single version is used for EasyConfigs depending on this software per toolchain. The EasyConfig test suite typically fails if this is not the case.
Score-P as an example is available with version 8.4 for all gompi toolchains since 2022a, with some toolchains even having multiple versions (which is an exception I assume).

@rafbiels
Copy link
Contributor Author

rafbiels commented Mar 5, 2025

Thanks, in this case I'll submit a second PR adding a GCC-13.2.0 based version after this one gets in.

@Thyre
Copy link
Collaborator

Thyre commented Mar 7, 2025

Test report by @Thyre
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
ZAM054 - Linux Zorin OS 17, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, 1 x NVIDIA NVIDIA GeForce MX550, 565.57.01, Python 3.10.12
See https://gist.github.com/Thyre/89e3a62d0f2fdfcc12f31c9196d019e2 for a full test report.

@Thyre
Copy link
Collaborator

Thyre commented Mar 7, 2025

I would recommend adding sanity checks to the EasyConfig, Both sanity_check_paths and sanity_check_commands can help verifying that the installation looks the way we would expect.

Maybe checking for bin/clang, bin/clang++ and clang --version / clang++ --version might be a good idea?
On the EasyBlock side, most compilers have a rather large sanity check, e.g. AOCC here. Those checks are certainly too much for an EasyConfig.

@rafbiels
Copy link
Contributor Author

Good point! I added relatively extensive sanity checks in 5f6b764 including a test command which compiles the simplest SYCL file possible (just the header include) for all supported target types.

@Thyre
Copy link
Collaborator

Thyre commented Mar 10, 2025

@boegelbot please test @ jsc-zen3

@jfgrimm
Copy link
Member

jfgrimm commented Apr 25, 2025

Test report by @jfgrimm
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node020.viking2.yor.alces.network - Linux Rocky Linux 8.9, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/595f0b3d6b49e939abb65c07cbea70f3 for a full test report.

@Crivella
Copy link
Contributor

@rafbiels Do you think there would be significant improvements in using a newer CUDA version for system where it is supported, eg if some instruction that are not present in 11.7 could be used for newer GPUs?

My idea is that if 11.7 will always give the best performance, than this could go in as a general install without the cuda suffix, otherwise we would probably need a cuda-version specific EC files

@rafbiels
Copy link
Contributor Author

Hi @Crivella,
as far as I'm aware, CUDA is only used to build the SYCL runtime library libur_adapter_cuda.so when building DPC++. This only links against the CUDA driver library libcuda.so, so all we use CUDA for is to find the right symbols for linking.

When using DPC++ to compile SYCL for NVIDIA GPUs, all the PTX generation is internal to DPC++ (the Clang CUDA toolchain) and no CUDA toolkit or driver dependency is used. Then, the PTX is lowered to SASS by directly calling ptxas (CUDA assembler) from PATH. The CUDA version used to build DPC++ has no influence on the device code compilation, only the currently-available one matters.

The only place where the initial CUDA version matters is when executing SYCL applications on NVIDIA GPUs, where the runtime library calls the CUDA driver using the libcuda.so symbols selected when DPC++ is built. The library is responsible for launching kernels and managing memory. It has no influence on the kernel performance and only affects the host-side overheads of the offloading. Here, when we compile with CUDA 11.7, we do use older symbols (still supported in newer versions because CUDA is backwards compatible) where we could be using newer ones. I compiled the library with both CUDA 11.7 and 12.8 and diff'ed the symbols - you can see the output below. There are very few differences and I believe they mostly reflect the evolution of CUDA API. I'm not expecting any performance implications of this difference.

The only functional difference is that the urEnqueueKernelLaunchCustomExp function is only supported starting from CUDA 11.8. It's part of the work-in-progress proposed extension to utilise thread block clusters in SM 9.0+ GPUs: sycl_ext_codeplay_cuda_cluster_group. Since we're shipping the oneAPI 2025.0 binary release of the SYCL CUDA runtime also compiled with CUDA 11.7, the lack of support for this extension in this EasyBuild module is compatible with our tested and "blessed" binaries.

The bottom line is that, in my view, using newer CUDA to compile DPC++ brings no performance benefits but breaks compatibility with older versions. It is also important to note that this "compatibility" is with CUDA driver library which is always provided by the system. It is not available from EasyBuild, which only provides the CUDA toolkit.

$ diff -U0 <(nm work/dpcpp-cuda11.7-lib/libur_adapter_cuda.so.0.10.8 | cut -d ' ' -f 2-) <(nm work/dpcpp-cuda12.8-lib/libur_adapter_cuda.so.0.10.8 | cut -d ' ' -f 2-)
--- /dev/fd/63	2025-04-25 16:45:43.040231648 +0000
+++ /dev/fd/62	2025-04-25 16:45:43.040231648 +0000
@@ -297,3 +297,3 @@
-t _ZN39ur_exp_command_buffer_command_handle_t_C1EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_
-t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_
-t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_.localalias
+t _ZN39ur_exp_command_buffer_command_handle_t_C1EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_
+t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_
+t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_.localalias
@@ -941 +941 @@
-                U cuGraphAddKernelNode
+                U cuGraphAddKernelNode_v2
@@ -947 +947 @@
-                U cuGraphExecKernelNodeSetParams
+                U cuGraphExecKernelNodeSetParams_v2
@@ -953,0 +954 @@
+                U cuLaunchKernelEx
@@ -1188,0 +1190,2 @@
+t urEnqueueKernelLaunchCustomExp.cold
+t urEnqueueKernelLaunchCustomExp.localalias

@Crivella
Copy link
Contributor

For me this would be fine to go in without the suffix, gonna ping someone else more involved with GPU software to get a final approval

ocaisa
ocaisa previously requested changes Apr 29, 2025
@Crivella
Copy link
Contributor

Test report by @Crivella
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
crivella-desktop - Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish), x86_64, 13th Gen Intel(R) Core(TM) i9-13900K (skylake), Python 3.11.13
See https://gist.github.com/Crivella/e30bb137ae55a62d8965cb4951423b1d for a full test report.

@Thyre
Copy link
Collaborator

Thyre commented Jun 30, 2025

Test report by @Crivella FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) crivella-desktop - Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish), x86_64, 13th Gen Intel(R) Core(TM) i9-13900K (skylake), Python 3.11.13 See https://gist.github.com/Crivella/e30bb137ae55a62d8965cb4951423b1d for a full test report.

/home/crivella/.local/easybuild/build/dpcpp/6.0.0/GCCcore-13.3.0/llvm-6.0.0/libdevice/imf/../imf_impl_utils.hpp:12:10: fatal error: 'cstddef' file not found
   12 | #include <cstddef>
      |          ^~~~~~~~~
1 error generated.

Looks like the same, or at least very similar, error you've hit with LLVM at some point, right?

@Crivella
Copy link
Contributor

Most likely, have to remember how i fixed it and/or what was causing it...

@Crivella
Copy link
Contributor

Crivella commented Jun 30, 2025

Basically the internal clang is not aware of --gcc-install-dir=$EBROOTGCCCORE/lib/gcc/x86_64-pc-linux-gnu/13.3.0 which causes it to miss this library if it is not available on the system.

(If I add it manually to the compile line that is failing it works)

I think also here we might have to add a .cfg file next to clang to resolve this

@Crivella
Copy link
Contributor

Crivella commented Jul 3, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Collaborator

@Crivella: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22418 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22418 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7118

Test results coming soon (I hope)...

Details

- notification for comment with ID 3032511801 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Crivella
Copy link
Contributor

Crivella commented Jul 3, 2025

Test report by @Crivella
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
crivella-desktop - Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish), x86_64, 13th Gen Intel(R) Core(TM) i9-13900K (skylake), Python 3.11.13
See https://gist.github.com/Crivella/48bd4b7019273641e574bdd2e2aa67ee for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/975f15a2a9d6425b523e32b267fa6b98 for a full test report.

Copy link
Contributor

@Crivella Crivella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ocaisa Do you think it is work keeping this open until the moduleclass discussion is resolved or should we merge this and in case fix it later?

@ocaisa
Copy link
Member

ocaisa commented Jul 31, 2025

@Crivella I resolved that discussion, we can resurrect it later if there is a problem to solve.

@Crivella Crivella dismissed ocaisa’s stale review July 31, 2025 13:27

Discussion resolved

@Crivella Crivella added this to the release after 5.1.1 milestone Jul 31, 2025
@Crivella
Copy link
Contributor

Going in, thanks @rafbiels!

@Crivella Crivella merged commit 3fb3141 into easybuilders:develop Jul 31, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants