Change in llvmcall module merging breaks GPU codegen relying on globals #48093
Labels
compiler:codegen
Generation of LLVM IR and native code
gpu
Affects running Julia on a GPU
regression
Regression in behavior compared to a previous version
Milestone
Julia's GPU back-ends need to be able to create variables with threadgroup- and thread-local semantics. Awaiting something like #47569, we currently do so by emitting LLVM IR that defines a global variable, and accessing that memory as an array using
unsafe_wrap
:Not particularly clean, but this has been working fine for us. Even if we have multiple calls to
shmem()
, we just get multiple instances of the IR, which get duplicated correctly upon module merging:This however changed on 1.9. Bisected to #44440 (cc @pchintalapudi) we only get a single shmem array, which obviously breaks a lot of things:
Even if I inline the
llvmcall
into the function, I still only get a single shmem array:Putting this on the milestone because this breaks our GPU back-ends. Happy to adapt those back-ends if another approach is better, although I don't want to go back to the old days where we made
shmem()
a macro so that we could unique the gvar name (which is just a bad UI).The text was updated successfully, but these errors were encountered: