[libclc] Initial support for cross-compiling OpenCL libraries#174022
[libclc] Initial support for cross-compiling OpenCL libraries#174022
Conversation
arsenm
left a comment
There was a problem hiding this comment.
I don't think cmake provides out of the box OpenCL support as a compile language. At one point we had the boilerplate to add it as a language here
libclc/cmake/modules/AddLibclc.cmake
Outdated
| # FIXME: Is this utility even necessary? The `-mlink-builtin-bitcode` | ||
| # option used to link the library in discards the modified linkage. |
There was a problem hiding this comment.
This is probably a leftover from before we did that. Also, I thought there was at least one other hack in prepare-builtins
There was a problem hiding this comment.
The two things it does from what I can tell
-
Removes
"opencl.ocl.version"metadata strings -
Replaces any non-external function with
linkonce_odrlinkage
The first one, I'm pretty sure llvm-link deduplicates now. For the second one, this is mostly just a cheap way to let the functions link while being eliminated by the compiler if unused. I think the ROCm Device Libs do that as well
There was a problem hiding this comment.
The version metadata is a workaround for having hundreds of entries. We had an AMDGPU pass to deduplicate them, but that was moved into the IR linker sometime in the last year
There was a problem hiding this comment.
Unfortunately I don't know how people really use libclc. I'd really like to just remove it but I can imagine someone complaining about this being gone, because -mlink-builtin-bitcode makes the linkage change unnecessary and the linker deduplication now makes the manual handling unnecessary
There was a problem hiding this comment.
This is probably a leftover from before we did that.
Thanks for the information. I wasn't aware of it.
Unfortunately I don't know how people really use
libclc. I'd really like to just remove it but I can imagine someone complaining about this being gone, because-mlink-builtin-bitcodemakes the linkage change unnecessary and the linker deduplication now makes the manual handling unnecessary
Our downstream targets also use -mlink-builtin-bitcode to link libclc bitcode files.
I think it is probably fine to drop this prepare_builtins utility given that the linkage change is unnecessary. I have just tried to drop the linkage change in our downstream code and doesn't find anything wrong in some basic testing.
There are a few additional changes that prepare_builtins do in our downstream code, e.g.
https://github.com/intel/llvm/blob/4decbf0da29f7daba8a87361456a264a331e2b5d/libclc/utils/prepare-builtins.cpp#L85-L110
https://github.com/intel/llvm/blob/4decbf0da29f7daba8a87361456a264a331e2b5d/libclc/utils/prepare-builtins.cpp#L129-L146
I'll check if they can be removed in the downstream as well.
There was a problem hiding this comment.
The code object version is better done via -Xclang -mcode-object-version=none by the compiler flags. The wchar issue is confounding, that should be a property of the target so I'd imagine it suggests you're mixing incompatible targets.
Removing CPU target features like that is a massive hack, but it's not an unprecedented one since we do similarly weird things in the ROCm Device Libs. Ideally we partition these libraries more intelligently, but does always_inline work?
There was a problem hiding this comment.
The code object version is better done via
-Xclang -mcode-object-version=noneby the compiler flags. Thewcharissue is confounding, that should be a property of the target so I'd imagine it suggests you're mixing incompatible targets.
Thanks for the suggestion. Will try it.
Removing CPU target features like that is a massive hack, but it's not an unprecedented one since we do similarly weird things in the ROCm Device Libs. Ideally we partition these libraries more intelligently, but does
always_inlinework?
always_inline works but it won't be used in libclc compile flags and it is problematic to inline bitcode files with incompatible target-features. The issue is probably that our downstream should build separate compatible bitcodes for supported targets.
| set( amdgcn--_devices tahiti ) | ||
| set( amdgcn-mesa-mesa3d_devices ${amdgcn--_devices} ) | ||
| set( amdgcn--amdhsa_devices none ) | ||
| set( amdgcn-amd-amdhsa_devices none ) |
There was a problem hiding this comment.
is there difference between amdgcn--amdhsa_devices and amdgcn-amd-amdhsa_devices?
There was a problem hiding this comment.
Nope, but all the other GPU compute targets use amdgcn-amd-amdhsa and I'd like these to go in the same directory eventually.
There was a problem hiding this comment.
perhaps we can drop amdgcn--amdhsa_devices target from libclc since the bitcode will be the same?
There was a problem hiding this comment.
I was planning on trimming this up in a PR tomorrow. We definitely could though I don't know the subtleties here, since maybe someone depends on the specific triple.
I also really want to change how the files are installed, we should use the regular triple and then use the specific device as a subdirectory if possible. Right now we have like libclc/ in the resource dir, when it should be more like standard PER_TARGET_RUNTIME_DIR I'd say.
There was a problem hiding this comment.
The amd vendor only does anything with spirv
libclc/CMakeLists.txt
Outdated
| set( clspv64--_devices none ) | ||
| set( nvptx--_devices none ) | ||
| set( nvptx64--_devices none ) | ||
| set( nvptx64-nvidia-cuda_devices none ) |
There was a problem hiding this comment.
Should the libclc implementation for this triple be unified with nvptx--nvidiacl/nvptx64--nvidiacl? Otherwise nvptx64-nvidia-cuda_devices won't pick up the code under ptx-nvidiacl.
nv implementations are in ptx-nvidiacl folder (https://github.com/llvm/llvm-project/tree/main/libclc/clc/lib/ptx-nvidiacl and https://github.com/llvm/llvm-project/tree/main/libclc/opencl/lib/ptx-nvidiacl)
and libclc uses the triple components to select the folder (see
llvm-project/libclc/CMakeLists.txt
Line 320 in 7ada892
There was a problem hiding this comment.
Yeah, probably. I don't really know anything about OpenCL on NVIDIA but I'm just trying to make things work with the normal compute triples. What would that require?
There was a problem hiding this comment.
I don't know about history of using nvidiacl in libclc. If nvptx64-nvidia-cuda is the right choice, we can probably rename ptx-nvidiacl folders so that nvptx64-nvidia-cuda will pick up files in those folders and drop nvptx64--nvidiacl target.
| set( LIBCLC_STANDALONE_BUILD FALSE ) | ||
|
|
||
| set( LLVM_PACKAGE_VERSION ${LLVM_VERSION} ) | ||
| set( PACKAGE_VERSION ${LLVM_PACKAGE_VERSION} ) |
There was a problem hiding this comment.
use LLVM_VERSION_MAJOR? LLVM_PACKAGE_VERSION is not set in in-tree build.
There was a problem hiding this comment.
Weird, LLVM_PACKAGE_VERSION should be set if LLVM_VERSION is set from what I know. We need PACKAGE_VERSION here set so finding the resource directory actually works properly. I suppose I can just set both.
There was a problem hiding this comment.
I'm a little confused what in-tree means here. If it's in the context like we're doing here with a runtimes build it should be set.
There was a problem hiding this comment.
I'm a little confused what
in-treemeans here. If it's in the context like we're doing here with a runtimes build it should be set.
sorry, you're right that LLVM_PACKAGE_VERSION is set in in-tree build when libclc is in LLVM_ENABLE_RUNTIMES.
It is just that our downstream still uses LLVM_ENABLE_PROJECTS="...,libclc" and in this case LLVM_PACKAGE_VERSION is not set. The reasons of not switching to LLVM_ENABLE_RUNTIMES yet are:
- prepare_builtins has build fail when libclc is in LLVM_ENABLE_RUNTIMES on windows when MSVC generator is used, see Update LLVM google/clspv#1521 (comment) and analysis at Update LLVM google/clspv#1521 (comment). CMAKE_C_COMPILER set at
doesn't work. This could be an blocking issue for switching libclc to
add_librarysince CMAKE_C_COMPILER will be used for compiling .cl files. LLVM_ENABLE_RUNTIMES="libclc"build is much slower thanLLVM_ENABLE_PROJECTS="libclc"in debug build when compiling a execution-time support library which depends on libclc. I have a local workaround for this issue and long term solution might be removing the dependency on libclc.
There was a problem hiding this comment.
Alright, so just setting both for now is the easiest?
There was a problem hiding this comment.
Alright, so just setting both for now is the easiest?
yeah. Perhaps this code
llvm-project/flang/CMakeLists.txt
Lines 335 to 337 in 80b62cb
|
@wenju-he @frasercrmck General question, how difficult would it be to port this project to use |
|
This is much simpler now that I landed some of the other PRs, includes #174611 |
| if( LLVM_RUNTIMES_BUILD AND LLVM_DEFAULT_TARGET_TRIPLE MATCHES "^nvptx|^amdgcn" ) | ||
| set( LIBCLC_DEFAULT_TARGET ${LLVM_DEFAULT_TARGET_TRIPLE} ) | ||
| endif() | ||
| set( LIBCLC_TARGETS_TO_BUILD ${LIBCLC_DEFAULT_TARGET} |
There was a problem hiding this comment.
would it better to pass -DLIBCLC_TARGETS_TO_BUILD=${LLVM_DEFAULT_TARGET_TRIPLE} as runtime configuration options so that there is not customization here?
There was a problem hiding this comment.
Yeah, I'm not completely sold on how to handle this. the fundamental difference here is that we are expecting to build a per-target toolchain. The libclc project completely breaks normal CMake projects by just building a ton of different architectures through custom commands.
We should only be building a single one, that's the expected way these cross-builds work. I think that's something that should be correct by construction. That being said, functionally it's not a major distinction right now because libclc dodges every bit of normal CMake. Normally this would implicitly put --target=amdgcn-amd-amdhsa on all your compiles and that's how you'd get the target code. Since we're doing it manually here it's more to fit in with the expected usage. I.e. the following builds the runtime for your target
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc
I'm not sure if there's a more elegant solution to this. If I had my way we'd rewrite all of this to use a custom language and do it the normal way with the above mechanism. Until then I'm not sure. This was the easiest way I could think of to do it. I have in the past added cache files for required GPU configs, but since this is basically overriding what the runtimes build above is trying to do I'm not so sure.
TL;DR DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc builds for a single target only and is the expected behavior. libclc doesn't get this because we do everything custom.
Summary: The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc, libcxx, libcxx-abi) all support builds through a runtime cross-build. In these builds we use a separate CMake build that cross-compiles to a single target. This patch provides basic support for this with the libclc libraries. Changes include adding support for the more standard GPU compute triples (amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in this mode. Some things left to do: This patch does not change the compiler invocations, this method would allow us to use standard CMake routines but this keeps it minimal. The prebuild support is questionable and doesn't fit into this scheme because it's a host executable, I'm ignoring it for now. The installed location should just use the triple with no libclc/ subdirectory handling I believe.
Summary:
The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc,
libcxx, libcxx-abi) all support builds through a runtime cross-build. In
these builds we use a separate CMake build that cross-compiles to a
single target.
This patch provides basic support for this with the
libclclibraries.Changes include adding support for the more standard GPU compute triples
(amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in
this mode.
Some things left to do:
This patch does not change the compiler invocations, this method would
allow us to use standard CMake routines but this keeps it minimal.
The prebuild support is questionable and doesn't fit into this scheme
because it's a host executable, I'm ignoring it for now.
The installed location should just use the triple with no
libclc/subdirectory handling I believe.