Skip to content

[libclc] Initial support for cross-compiling OpenCL libraries#174022

Merged
jhuber6 merged 1 commit intollvm:mainfrom
jhuber6:libclc
Jan 7, 2026
Merged

[libclc] Initial support for cross-compiling OpenCL libraries#174022
jhuber6 merged 1 commit intollvm:mainfrom
jhuber6:libclc

Conversation

@jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Dec 30, 2025

Summary:
The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc,
libcxx, libcxx-abi) all support builds through a runtime cross-build. In
these builds we use a separate CMake build that cross-compiles to a
single target.

This patch provides basic support for this with the libclc libraries.
Changes include adding support for the more standard GPU compute triples
(amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in
this mode.

Some things left to do:

This patch does not change the compiler invocations, this method would
allow us to use standard CMake routines but this keeps it minimal.

The prebuild support is questionable and doesn't fit into this scheme
because it's a host executable, I'm ignoring it for now.

The installed location should just use the triple with no libclc/
subdirectory handling I believe.

@llvmbot llvmbot added the libclc libclc OpenCL library label Dec 30, 2025
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think cmake provides out of the box OpenCL support as a compile language. At one point we had the boilerplate to add it as a language here

Comment on lines 458 to 459
# FIXME: Is this utility even necessary? The `-mlink-builtin-bitcode`
# option used to link the library in discards the modified linkage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a leftover from before we did that. Also, I thought there was at least one other hack in prepare-builtins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two things it does from what I can tell

  1. Removes "opencl.ocl.version" metadata strings

  2. Replaces any non-external function with linkonce_odr linkage

The first one, I'm pretty sure llvm-link deduplicates now. For the second one, this is mostly just a cheap way to let the functions link while being eliminated by the compiler if unused. I think the ROCm Device Libs do that as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version metadata is a workaround for having hundreds of entries. We had an AMDGPU pass to deduplicate them, but that was moved into the IR linker sometime in the last year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't know how people really use libclc. I'd really like to just remove it but I can imagine someone complaining about this being gone, because -mlink-builtin-bitcode makes the linkage change unnecessary and the linker deduplication now makes the manual handling unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a leftover from before we did that.

Thanks for the information. I wasn't aware of it.

Unfortunately I don't know how people really use libclc. I'd really like to just remove it but I can imagine someone complaining about this being gone, because -mlink-builtin-bitcode makes the linkage change unnecessary and the linker deduplication now makes the manual handling unnecessary

Our downstream targets also use -mlink-builtin-bitcode to link libclc bitcode files.
I think it is probably fine to drop this prepare_builtins utility given that the linkage change is unnecessary. I have just tried to drop the linkage change in our downstream code and doesn't find anything wrong in some basic testing.

There are a few additional changes that prepare_builtins do in our downstream code, e.g.
https://github.com/intel/llvm/blob/4decbf0da29f7daba8a87361456a264a331e2b5d/libclc/utils/prepare-builtins.cpp#L85-L110
https://github.com/intel/llvm/blob/4decbf0da29f7daba8a87361456a264a331e2b5d/libclc/utils/prepare-builtins.cpp#L129-L146
I'll check if they can be removed in the downstream as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code object version is better done via -Xclang -mcode-object-version=none by the compiler flags. The wchar issue is confounding, that should be a property of the target so I'd imagine it suggests you're mixing incompatible targets.

Removing CPU target features like that is a massive hack, but it's not an unprecedented one since we do similarly weird things in the ROCm Device Libs. Ideally we partition these libraries more intelligently, but does always_inline work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code object version is better done via -Xclang -mcode-object-version=none by the compiler flags. The wchar issue is confounding, that should be a property of the target so I'd imagine it suggests you're mixing incompatible targets.

Thanks for the suggestion. Will try it.

Removing CPU target features like that is a massive hack, but it's not an unprecedented one since we do similarly weird things in the ROCm Device Libs. Ideally we partition these libraries more intelligently, but does always_inline work?

always_inline works but it won't be used in libclc compile flags and it is problematic to inline bitcode files with incompatible target-features. The issue is probably that our downstream should build separate compatible bitcodes for supported targets.

@arsenm arsenm requested a review from wenju-he December 31, 2025 18:08
set( amdgcn--_devices tahiti )
set( amdgcn-mesa-mesa3d_devices ${amdgcn--_devices} )
set( amdgcn--amdhsa_devices none )
set( amdgcn-amd-amdhsa_devices none )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there difference between amdgcn--amdhsa_devices and amdgcn-amd-amdhsa_devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, but all the other GPU compute targets use amdgcn-amd-amdhsa and I'd like these to go in the same directory eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can drop amdgcn--amdhsa_devices target from libclc since the bitcode will be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on trimming this up in a PR tomorrow. We definitely could though I don't know the subtleties here, since maybe someone depends on the specific triple.

I also really want to change how the files are installed, we should use the regular triple and then use the specific device as a subdirectory if possible. Right now we have like libclc/ in the resource dir, when it should be more like standard PER_TARGET_RUNTIME_DIR I'd say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amd vendor only does anything with spirv

set( clspv64--_devices none )
set( nvptx--_devices none )
set( nvptx64--_devices none )
set( nvptx64-nvidia-cuda_devices none )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the libclc implementation for this triple be unified with nvptx--nvidiacl/nvptx64--nvidiacl? Otherwise nvptx64-nvidia-cuda_devices won't pick up the code under ptx-nvidiacl.
nv implementations are in ptx-nvidiacl folder (https://github.com/llvm/llvm-project/tree/main/libclc/clc/lib/ptx-nvidiacl and https://github.com/llvm/llvm-project/tree/main/libclc/opencl/lib/ptx-nvidiacl)
and libclc uses the triple components to select the folder (see

list( APPEND opencl_dirs ${DARCH} ${DARCH}-${OS} ${DARCH}-${VENDOR}-${OS} )
).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably. I don't really know anything about OpenCL on NVIDIA but I'm just trying to make things work with the normal compute triples. What would that require?

Copy link
Contributor

@wenju-he wenju-he Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about history of using nvidiacl in libclc. If nvptx64-nvidia-cuda is the right choice, we can probably rename ptx-nvidiacl folders so that nvptx64-nvidia-cuda will pick up files in those folders and drop nvptx64--nvidiacl target.

set( LIBCLC_STANDALONE_BUILD FALSE )

set( LLVM_PACKAGE_VERSION ${LLVM_VERSION} )
set( PACKAGE_VERSION ${LLVM_PACKAGE_VERSION} )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LLVM_VERSION_MAJOR? LLVM_PACKAGE_VERSION is not set in in-tree build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, LLVM_PACKAGE_VERSION should be set if LLVM_VERSION is set from what I know. We need PACKAGE_VERSION here set so finding the resource directory actually works properly. I suppose I can just set both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused what in-tree means here. If it's in the context like we're doing here with a runtimes build it should be set.

Copy link
Contributor

@wenju-he wenju-he Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused what in-tree means here. If it's in the context like we're doing here with a runtimes build it should be set.

sorry, you're right that LLVM_PACKAGE_VERSION is set in in-tree build when libclc is in LLVM_ENABLE_RUNTIMES.

It is just that our downstream still uses LLVM_ENABLE_PROJECTS="...,libclc" and in this case LLVM_PACKAGE_VERSION is not set. The reasons of not switching to LLVM_ENABLE_RUNTIMES yet are:

  • prepare_builtins has build fail when libclc is in LLVM_ENABLE_RUNTIMES on windows when MSVC generator is used, see Update LLVM google/clspv#1521 (comment) and analysis at Update LLVM google/clspv#1521 (comment). CMAKE_C_COMPILER set at
    set(compiler_args -DCMAKE_C_COMPILER=${LLVM_RUNTIME_OUTPUT_INTDIR}/clang-cl${CMAKE_EXECUTABLE_SUFFIX}
    doesn't work. This could be an blocking issue for switching libclc to add_library since CMAKE_C_COMPILER will be used for compiling .cl files.
  • LLVM_ENABLE_RUNTIMES="libclc" build is much slower than LLVM_ENABLE_PROJECTS="libclc" in debug build when compiling a execution-time support library which depends on libclc. I have a local workaround for this issue and long term solution might be removing the dependency on libclc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so just setting both for now is the easiest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so just setting both for now is the easiest?

yeah. Perhaps this code

if (NOT PACKAGE_VERSION)
set(PACKAGE_VERSION ${LLVM_VERSION_MAJOR})
endif()
is better.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 4, 2026

@wenju-he @frasercrmck General question, how difficult would it be to port this project to use add_library instead? We would need some custom language support, that's just CMake boilerplate. Getting the final linked library can easily be done with a custom command to call llvm-link as llvm-link on a static library will link everything inside it.

@wenju-he
Copy link
Contributor

wenju-he commented Jan 5, 2026

@wenju-he @frasercrmck General question, how difficult would it be to port this project to use add_library instead? We would need some custom language support, that's just CMake boilerplate. Getting the final linked library can easily be done with a custom command to call llvm-link as llvm-link on a static library will link everything inside it.

Using add_library like other runtime library do should be the goal for libclc, see #156778 (review).
Other than the cmake change and llvm-link you mentioned, I see three additional tasks:

  1. Drop github.com/[libclc] Override generic symbol using llvm-link --override flag instead of using weak linkage #156778 and refactor to avoid duplicates symbols in two files. I can take this task.
  2. Remove .ll files which don't need clang compilation (the first 3 files will be removed by [libclc] Refine __clc_fp*_subnormals_supported #157633):
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/generic/subnormal_disable.ll
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/generic/subnormal_helper_func.ll
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/generic/subnormal_use_default.ll
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/image/get_image_attributes_impl.ll
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/image/read_image_impl.ll
    https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/image/write_image_impl.ll
  3. Fix that CMAKE_C_COMPILER set during runtime build stage is ignored by cmake MSVC generator.

@frasercrmck please advice if something is missing.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 6, 2026

This is much simpler now that I landed some of the other PRs, includes #174611

if( LLVM_RUNTIMES_BUILD AND LLVM_DEFAULT_TARGET_TRIPLE MATCHES "^nvptx|^amdgcn" )
set( LIBCLC_DEFAULT_TARGET ${LLVM_DEFAULT_TARGET_TRIPLE} )
endif()
set( LIBCLC_TARGETS_TO_BUILD ${LIBCLC_DEFAULT_TARGET}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it better to pass -DLIBCLC_TARGETS_TO_BUILD=${LLVM_DEFAULT_TARGET_TRIPLE} as runtime configuration options so that there is not customization here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not completely sold on how to handle this. the fundamental difference here is that we are expecting to build a per-target toolchain. The libclc project completely breaks normal CMake projects by just building a ton of different architectures through custom commands.

We should only be building a single one, that's the expected way these cross-builds work. I think that's something that should be correct by construction. That being said, functionally it's not a major distinction right now because libclc dodges every bit of normal CMake. Normally this would implicitly put --target=amdgcn-amd-amdhsa on all your compiles and that's how you'd get the target code. Since we're doing it manually here it's more to fit in with the expected usage. I.e. the following builds the runtime for your target

-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc

I'm not sure if there's a more elegant solution to this. If I had my way we'd rewrite all of this to use a custom language and do it the normal way with the above mechanism. Until then I'm not sure. This was the easiest way I could think of to do it. I have in the past added cache files for required GPU configs, but since this is basically overriding what the runtimes build above is trying to do I'm not so sure.

TL;DR DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc builds for a single target only and is the expected behavior. libclc doesn't get this because we do everything custom.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jhuber6, the direction looks to me.

Summary:
The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc,
libcxx, libcxx-abi) all support builds through a runtime cross-build. In
these builds we use a separate CMake build that cross-compiles to a
single target.

This patch provides basic support for this with the libclc libraries.
Changes include adding support for the more standard GPU compute triples
(amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in
this mode.

Some things left to do:

This patch does not change the compiler invocations, this method would
allow us to use standard CMake routines but this keeps it minimal.

The prebuild support is questionable and doesn't fit into this scheme
because it's a host executable, I'm ignoring it for now.

The installed location should just use the triple with no libclc/
subdirectory handling I believe.
Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm added the cmake Build system in general and CMake in particular label Jan 7, 2026
@jhuber6 jhuber6 merged commit 9315747 into llvm:main Jan 7, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmake Build system in general and CMake in particular libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants