{compiler}[GCCcore/13.3.0] dpcpp v6.0.0#22418
Conversation
Add new easyconfig for the open-source DPC++ compiler version 6.0.0 supporting SYCL compilation for OpenCL, Level Zero and CUDA backends, allowing to run SYCL code on x86 CPUs, Intel GPUs and NVIDIA GPUs. A patch file and configuration options are in place to ensure compatibility with a wide range of CUDA releases by removing the dependency on CUPTI. This build of DPC++ can be used with any CUDA version starting from 11.7+ or without CUDA at all (to target only OpenCL and Level Zero backends).
|
We typically only have a single CUDA version being used per toolchain generation as far as I am aware of. For GCCcore 13.3.0, this normally is CUDA 12.6.0. Personally, I would prefer having a fixed CUDA version over having the tracing feature disabled. Having CUDA as an "optional" dependency (i.e. build dependency) sounds fine to me though. I'm wondering how much the build parameters created by |
|
Thank you for the feedback, these are very good points. I'll try to explain the reasoning behind this setup, please let me know what you think.
My idea was for this to work on HPC systems with their own system installation of CUDA, regardless of the version, rather than pulling a specific version from EasyBuild (which still also works in this setup). This, in my experience, works best for full support of the matching driver and tools like profilers. Since CUDA is backwards but not forwards compatible, we want to build DPC++ with the oldest CUDA version supported by us (11.7 for this release), and then users can use it with any newer version. The build is also aimed to be as close as possible to Intel's and Codeplay's binary releases (oneAPI 2025.0.0) while still remaining fully open-source. We actually disable CUPTI tracing in our binary release to allow this portability and advise users that all CUDA versions 11.7+ work with our backend library (see https://developer.codeplay.com/products/oneapi/nvidia/2025.0.0/guides/get-started-guide-nvidia).
I believe the options are considerably different and using the provided buildbot script makes the easyconfig more concise and easier to maintain. Any option changes between versions won't need to be mapped onto easyconfig and we will be able to reuse the same solution for future versions. Since the DPC++ project is kept in close sync with upstream llvm with regular pull-downs and occasional upstreaming of features, we didn't modify the llvm default options but instead have the buildbot which puts together the DPC++ defaults. For reference, these are the CMake options created by this easyconfig calling the buildbot: |
|
Thanks a lot for the extensive comment. This clears things up a lot.
From my experience, users of EasyBuild typically also install CUDA via EasyBuild and don't have an external version lying around. This may be different on HPE Cray machines, I'm not sure. However, I completely understand your reasoning here. Maybe other people can give input here on how we would want to handle this in EasyBuild. For the actual (non Open Source) oneAPI compilers, I always chose a CUDA version close to what is used in the toolchain, see #21582.
I didn't know that, thanks for the information! This totally makes sense after thinking a bit more about it, especially as only one CUPTI instance can be attached, which could cause issues with other profiling tools like Nsight Systems or Score-P. For profiling / tracing on NVIDIA GPUs, people would probably use these tools anyway, though some SYCL information might get lost I assume.
I agree, the I also started a local build of your PR, and will do some testing once that is finished. |
|
Test report by @Thyre |
Hm, the build failed with the following error: Looks like my My system already has the |
|
Building on a system without a GPU fails shortly after starting to build with: We should probably also set |
|
Thank you for testing thoroughly! My setup clearly missed those issues. I haven't yet come back to testing that glibc patch, many thanks for sharing your findings in the ticket. I hope this can be fixed soon as well. Is there any EasyBuild module that provides glibc? Perhaps that could be more reliable (and reproducible!) than taking the system one. For the |
No problem 😄
As far as I know, this is basically something out of our control. Most host systems only have one
Thanks! I've also noticed a few (small) things after managing to build the EasyConfig on a system with an older and while the loader finds adapters for Level Zero, OpenCL and CUDA, the x86 one seems to be missing. Maybe because this is an AMD CPU? |
Since the projects mantains their own way of configuring llvm, I would say the only things that would be nice to check are if RPATHing is being properly enforced on all binaries/libraries. I had some trouble with it in the runtimes builds as those are built using the compilers produced in the project stage, which i had to solve in a slightly hackish way easybuilders/easybuild-easyblocks@c2b7c9d. Guess also checking that
I think my WS has spent more time building LLVM than anything else since its lifetime 😅
In general when a build require many options and non trivial logic, those would be implemented in an EasyBlock, which would make only a few configure option relevant for the respective EasyConfig files. |
… command Add find_library for libcuda.so in the patch, as the logic is missing in this release. Add --gcc-toolchain config for clang in the same way as for clang++.
|
@Thyre I added a
@Crivella many thanks for your insights! I tested the build with Only Is there anything else you'd recommend to test here? How could I test that |
|
@boegelbot please test @ jsc-zen3 |
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2695490690 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2695610334 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
Have not checked myself, but in case some parts of your build are built as LLVM runtimes (either custom to this project or native LLVM) that is where you should be checking the rpath. EDIT: To my experience, most if not all the runtime stuff ends up in `<install_dir>/lib/<target_trible> to give an example you should check
You can check as a base the way software is built for EESSI or on top of EESSI |
|
Thank you @Crivella. This build doesn't include libc++ since DPC++ is only tested and shipped with libstdc++ (though of course should work fine with libc++). I'm also only building the host runtime for OpenMP and that (libomp) looks to be RPATHed correctly as well. I went through the I also tested the build with EESSI and it succeeded, again with RPATH in runtime libraries looking good. While testing the EESSI build, which is actually where I'd also like to add DPC++, I realised they don't support the 2024a compiler toolchain yet (GCC-13.3.0). I guess that's coming later this year based on https://gitlab.com/eessi/support/-/issues/56 but likely with glibc 2.41 so we also need to fix intel/llvm#16903 for that. This made me wonder if I should downgrade the toolchain in this PR to GCC 13.2.0, or maybe better to create a second easyconfig with the other version? Do you have any experience / suggestions in this aspect? |
Having the same software version in multiple toolchains is typically not a problem, as long as only a single version is used for EasyConfigs depending on this software per toolchain. The EasyConfig test suite typically fails if this is not the case. |
|
Thanks, in this case I'll submit a second PR adding a GCC-13.2.0 based version after this one gets in. |
|
Test report by @Thyre |
|
I would recommend adding sanity checks to the EasyConfig, Both Maybe checking for |
|
Good point! I added relatively extensive sanity checks in 5f6b764 including a test command which compiles the simplest SYCL file possible (just the header include) for all supported target types. |
|
@boegelbot please test @ jsc-zen3 |
|
Test report by @jfgrimm |
|
@rafbiels Do you think there would be significant improvements in using a newer CUDA version for system where it is supported, eg if some instruction that are not present in 11.7 could be used for newer GPUs? My idea is that if 11.7 will always give the best performance, than this could go in as a general install without the cuda suffix, otherwise we would probably need a cuda-version specific EC files |
|
Hi @Crivella, When using DPC++ to compile SYCL for NVIDIA GPUs, all the PTX generation is internal to DPC++ (the Clang CUDA toolchain) and no CUDA toolkit or driver dependency is used. Then, the PTX is lowered to SASS by directly calling The only place where the initial CUDA version matters is when executing SYCL applications on NVIDIA GPUs, where the runtime library calls the CUDA driver using the The only functional difference is that the The bottom line is that, in my view, using newer CUDA to compile DPC++ brings no performance benefits but breaks compatibility with older versions. It is also important to note that this "compatibility" is with CUDA driver library which is always provided by the system. It is not available from EasyBuild, which only provides the CUDA toolkit. $ diff -U0 <(nm work/dpcpp-cuda11.7-lib/libur_adapter_cuda.so.0.10.8 | cut -d ' ' -f 2-) <(nm work/dpcpp-cuda12.8-lib/libur_adapter_cuda.so.0.10.8 | cut -d ' ' -f 2-)
--- /dev/fd/63 2025-04-25 16:45:43.040231648 +0000
+++ /dev/fd/62 2025-04-25 16:45:43.040231648 +0000
@@ -297,3 +297,3 @@
-t _ZN39ur_exp_command_buffer_command_handle_t_C1EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_
-t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_
-t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st26CUDA_KERNEL_NODE_PARAMS_stjPKmS8_S8_.localalias
+t _ZN39ur_exp_command_buffer_command_handle_t_C1EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_
+t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_
+t _ZN39ur_exp_command_buffer_command_handle_t_C2EP31ur_exp_command_buffer_handle_t_P19ur_kernel_handle_t_P14CUgraphNode_st29CUDA_KERNEL_NODE_PARAMS_v2_stjPKmS8_S8_.localalias
@@ -941 +941 @@
- U cuGraphAddKernelNode
+ U cuGraphAddKernelNode_v2
@@ -947 +947 @@
- U cuGraphExecKernelNodeSetParams
+ U cuGraphExecKernelNodeSetParams_v2
@@ -953,0 +954 @@
+ U cuLaunchKernelEx
@@ -1188,0 +1190,2 @@
+t urEnqueueKernelLaunchCustomExp.cold
+t urEnqueueKernelLaunchCustomExp.localalias |
|
For me this would be fine to go in without the suffix, gonna ping someone else more involved with GPU software to get a final approval |
easybuild/easyconfigs/d/dpcpp/dpcpp-6.0.0-cmake-cuda-deps.patch
Outdated
Show resolved
Hide resolved
|
Test report by @Crivella |
Looks like the same, or at least very similar, error you've hit with LLVM at some point, right? |
|
Most likely, have to remember how i fixed it and/or what was causing it... |
|
Basically the internal clang is not aware of (If I add it manually to the compile line that is failing it works) I think also here we might have to add a |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@Crivella: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3032511801 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @Crivella |
|
Test report by @boegelbot |
|
@Crivella I resolved that discussion, we can resurrect it later if there is a problem to solve. |
|
Going in, thanks @rafbiels! |
Add new easyconfig for the open-source DPC++ compiler version 6.0.0 (https://github.com/intel/llvm/releases/tag/v6.0.0). This build supports SYCL compilation for OpenCL, Level Zero and CUDA backends, allowing to run SYCL code on x86 CPUs, Intel GPUs and NVIDIA GPUs.
A patch file and configuration options are in place to ensure compatibility with a wide range of CUDA releases by removing the dependency on CUPTI. This build of DPC++ can be used with any CUDA version starting from 11.7+ or without CUDA at all (to target only OpenCL and Level Zero backends).