[SPIRV] Do not use OpTypeRuntimeArray in Kernel env.#149522
[SPIRV] Do not use OpTypeRuntimeArray in Kernel env.#149522
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-spir-v Author: Marcos Maronas (maarquitos14) ChangesPrior to this patch, when Additionally, the newly added test prior to this patch was generating a module with both Finally, prior to this patch, the newly added test was adding Full diff: https://github.com/llvm/llvm-project/pull/149522.diff 3 Files Affected:
diff --git a/llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp b/llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
index 83fccdc2bdba3..982d48f2a5a76 100644
--- a/llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
@@ -828,9 +828,11 @@ SPIRVType *SPIRVGlobalRegistry::getOpTypeArray(uint32_t NumElems,
"Invalid array element type");
SPIRVType *SpvTypeInt32 = getOrCreateSPIRVIntegerType(32, MIRBuilder);
SPIRVType *ArrayType = nullptr;
- if (NumElems != 0) {
- Register NumElementsVReg =
- buildConstantInt(NumElems, MIRBuilder, SpvTypeInt32, EmitIR);
+ const SPIRVSubtarget &ST =
+ cast<SPIRVSubtarget>(MIRBuilder.getMF().getSubtarget());
+ if (NumElems != 0 || !ST.isShader()) {
+ Register NumElementsVReg = buildConstantInt(
+ NumElems ? NumElems : 1, MIRBuilder, SpvTypeInt32, EmitIR);
ArrayType = createOpType(MIRBuilder, [&](MachineIRBuilder &MIRBuilder) {
return MIRBuilder.buildInstr(SPIRV::OpTypeArray)
.addDef(createTypeVReg(MIRBuilder))
diff --git a/llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp b/llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
index ad976e5288927..07628c6885b81 100644
--- a/llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
@@ -744,8 +744,14 @@ void SPIRV::RequirementHandler::checkSatisfiable(
IsSatisfiable = false;
}
+ AvoidCapabilitiesSet AvoidCaps;
+ if (!ST.isShader())
+ AvoidCaps.S.insert(SPIRV::Capability::Shader);
+ else
+ AvoidCaps.S.insert(SPIRV::Capability::Kernel);
+
for (auto Cap : MinimalCaps) {
- if (AvailableCaps.contains(Cap))
+ if (AvailableCaps.contains(Cap) && !AvoidCaps.S.contains(Cap))
continue;
LLVM_DEBUG(dbgs() << "Capability not supported: "
<< getSymbolicOperandMnemonic(
diff --git a/llvm/test/CodeGen/SPIRV/array_type.ll b/llvm/test/CodeGen/SPIRV/array_type.ll
new file mode 100644
index 0000000000000..0957a7d191922
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/array_type.ll
@@ -0,0 +1,22 @@
+; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s --check-prefix=CHECK-KERNEL
+; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %}
+
+; CHECK-KERNEL: OpCapability Kernel
+; CHECK-KERNEL-NOT: OpCapability Shader
+; CHECK-KERNEL: OpTypeArray
+; CHECK-KERNEL-NOT: OpTypeRuntimeArray
+
+%"class.sycl::_V1::detail::half_impl::half" = type { half }
+
+; Function Attrs: mustprogress norecurse nounwind
+define spir_kernel void @foo(ptr addrspace(3) noundef align 2 %_arg_temp, ptr addrspace(1) noundef align 2 %_arg_acc_a){
+entry:
+ %0 = getelementptr %"class.sycl::_V1::detail::half_impl::half", ptr addrspace(1) %_arg_acc_a, i64 15
+ %add.ptr.i = getelementptr %"class.sycl::_V1::detail::half_impl::half", ptr addrspace(1) %0, i64 10
+ %4 = getelementptr %"class.sycl::_V1::detail::half_impl::half", ptr addrspace(1) %add.ptr.i, i64 20
+ %arrayidx.i5.i = getelementptr %"class.sycl::_V1::detail::half_impl::half", ptr addrspace(1) %4, i64 35
+ %arrayidx7.i = getelementptr inbounds [0 x [32 x %"class.sycl::_V1::detail::half_impl::half"]], ptr addrspace(3) %_arg_temp, i64 1, i64 25, i64 30
+ %5 = load i16, ptr addrspace(1) %arrayidx.i5.i, align 2
+ store i16 %5, ptr addrspace(3) %arrayidx7.i, align 2
+ ret void
+}
|
Keenuts
left a comment
There was a problem hiding this comment.
Thanks for the PR, some comments
| AvoidCapabilitiesSet AvoidCaps; | ||
| if (!ST.isShader()) | ||
| AvoidCaps.S.insert(SPIRV::Capability::Shader); | ||
| else | ||
| AvoidCaps.S.insert(SPIRV::Capability::Kernel); | ||
|
|
||
| for (auto Cap : MinimalCaps) { | ||
| if (AvailableCaps.contains(Cap)) | ||
| if (AvailableCaps.contains(Cap) && !AvoidCaps.S.contains(Cap)) |
There was a problem hiding this comment.
This doesn't seem useful. I recompiled your patch with only the OpRuntimeArray change the the capability is not emitted.
(Since OpTypeRuntimeArray is not added, the capability is not required).
There was a problem hiding this comment.
I don't mind this change. It will help us identify errors of this type earlier. I'm guessing that if you do this change, and not the OpRuntimeArray change, the test case would give an error that the shader capability is not supported. This will be generally useful. We run into error like this with shader that accidentally adding the kernel capability.
There was a problem hiding this comment.
It's not useful because after the patch, there is no conflict anymore, but having it would have helped us caught the error much more easily. Aside from that, it's just honoring --avoid-capabilities flag: my command was explicitly passing --avoid-capabilities=shader, but it was generating shader anyway, which was not complying with the command line flag. With this change, we make sure the flag is acting as expected.
There was a problem hiding this comment.
I'm guessing that if you do this change, and not the OpRuntimeArray change, the test case would give an error that the shader capability is not supported.
We should already have this error surfacing: after all we have a list of available capabilities by environment.
But seems like when !ST.isShader() we add OpenCL capabilities, but some are implicitely adding Shader (StorageImageReadWithoutFormat).
Removing those from the OpenCL list does surface the existing "Unable to meet requirement" error)
| if (NumElems != 0 || !ST.isShader()) { | ||
| Register NumElementsVReg = buildConstantInt( | ||
| NumElems ? NumElems : 1, MIRBuilder, SpvTypeInt32, EmitIR); |
There was a problem hiding this comment.
Is this the correct solution?
I understand OpRuntimeArray is not supported for Kernel, but seems weird to silently allow this and generate a 1-element array instead.
Are runtime arrays allowed in OpenCL?
If not, should the compiler refuse to lower a runtime array to Kernel SPIR-V?
There was a problem hiding this comment.
Is this the correct solution?
+1
I had an offline chat with Marcos. He'll look into changing the gep pattern that is causing issue (the SYCL compiler don't need to emit this type outside some offset computation)
There was a problem hiding this comment.
Is this the correct solution?
That's interesting Q. I lean to say: yes. This replacement will legalize few GEPs that would be UB otherwise, but since they were already UB - such replacement should be fine.
Are runtime arrays allowed in OpenCL?
If not, should the compiler refuse to lower a runtime array to Kernel SPIR-V?
OpenCL allows unbounded arrays as kernel parameters and I believe GEP to unbound array kernel argument happens in the test provided by @maarquitos14 (though I'm having troubles to get a similar IR in my experiments).
There was a problem hiding this comment.
I don't know OpenCL, but is it possible to have a zero sized array in a struct? Will making it an array of size 1 change the layout of the struct? Would it be better for the FE to change the type to an array of size 1 to avoid potential issue like that?
There was a problem hiding this comment.
Here is an example of something I would be worried about: https://godbolt.org/z/zaTT7nTvr. Note that if you optimize the code the layout is different than the unoptimized code.
There was a problem hiding this comment.
BTW, @Naghasan do you happen to know, why OpTypeRuntimeArray is not allowed for Kernel? It's not looking like VLA to me.
There was a problem hiding this comment.
The issue is not the access to i, but the offset of b.
The issue is that when the code is not optimized, accesses to b are done through the struct whose first element i is an array of size 1. So b is considered to be at offset 4 in the struct. Is that right? See https://godbolt.org/z/37vbjGP3r. I removed the UB.
However, if the code is optimized, the optimizer modifies the GEP to access b starting at offset 0 in the struct. From the optimizer's perspective this is correct because i is an array of size 0. See https://godbolt.org/z/fj1hfdeqo.
There was a problem hiding this comment.
But examples like https://godbolt.org/z/zaTT7nTvr are already UB
This isn't UB at all, it is outside the core specs (C++ says 0 is not valid) but it is a valid compiler extension (add -pedantic and you get a warning: zero size arrays are an extension [-Wzero-length-array]). And 0 as first offset makes it perfectly fine as well (you don't move the pointer). And in other languages, there is valid reasons to have this construct.
BTW, @Naghasan do you happen to know, why OpTypeRuntimeArray is not allowed for Kernel? It's not looking like VLA to me.
This isn't linked to VLA, but a Vulkan runtime construct that isn't mappable in OpenCL
There was a problem hiding this comment.
The issue was originally found in a test similar to this --which I reduced for the sake of readability. Indeed the problem comes from an unbound array.
#include <sycl/builtins.hpp>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/experimental/work_group_memory.hpp>
#include <sycl/group_barrier.hpp>
#include <sycl/half_type.hpp>
// #include <cstring>
namespace syclexp = sycl::ext::oneapi::experimental;
template <typename T, size_t N>
void swap_array_2d(T (&a)[N][N], T (&b)[N][N], size_t batch_size) {
sycl::queue q;
const auto size = sycl::range{N, N};
const auto wgsize = sycl::range{batch_size, batch_size};
{
sycl::buffer<T, 2> buf_a{a[0], sycl::range{N, N}};
sycl::buffer<T, 2> buf_b{b[0], sycl::range{N, N}};
q.submit([&](sycl::handler &cgh) {
sycl::accessor acc_a{buf_a, cgh};
sycl::accessor acc_b{buf_b, cgh};
syclexp::work_group_memory<T[][N]> temp{N, cgh};
sycl::nd_range<2> ndr{size, wgsize};
cgh.parallel_for(ndr, [=](sycl::nd_item<2> it) {
const auto i = it.get_global_id()[0];
const auto j = it.get_global_id()[1];
temp[i][j] = acc_a[i][j];
acc_a[i][j] = acc_b[i][j];
syclexp::work_group_memory<T[][N]> temp2{temp};
acc_b[i][j] = temp2[i][j];
});
});
}
}
constexpr size_t N = 32;
template <typename T> void test() {
T intarr1[N][N];
T intarr2[N][N];
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
intarr1[i][j] = T(i) + T(j);
intarr2[i][j] = T(i) * T(j);
}
}
swap_array_2d(intarr1, intarr2, 8);
}
int main() {
sycl::queue q;
if (q.get_device().has(sycl::aspect::fp16))
test<sycl::half>();
return 0;
}
There was a problem hiding this comment.
After multiple offline discussions with @Naghasan, we have added a new rule in simplifyGEPInst to handle this case. SPIRVEmitIntrinsics calls simplifyGEPInst when handling GEP instructions to make sure this pattern is also simplified even if the frontend runs without optimizations.
| %"class.sycl::_V1::detail::half_impl::half" = type { half } | ||
|
|
||
| ; Function Attrs: mustprogress norecurse nounwind | ||
| define spir_kernel void @foo(ptr addrspace(3) noundef align 2 %_arg_temp, ptr addrspace(1) noundef align 2 %_arg_acc_a){ |
There was a problem hiding this comment.
Can we simplify the test, for example just by checking how alloca [0 x i32] is being lowered?
There was a problem hiding this comment.
There are now more test cases, but each of them is simpler.
| if (NumElems != 0 || !ST.isShader()) { | ||
| Register NumElementsVReg = buildConstantInt( | ||
| NumElems ? NumElems : 1, MIRBuilder, SpvTypeInt32, EmitIR); |
There was a problem hiding this comment.
Is this the correct solution?
That's interesting Q. I lean to say: yes. This replacement will legalize few GEPs that would be UB otherwise, but since they were already UB - such replacement should be fine.
Are runtime arrays allowed in OpenCL?
If not, should the compiler refuse to lower a runtime array to Kernel SPIR-V?
OpenCL allows unbounded arrays as kernel parameters and I believe GEP to unbound array kernel argument happens in the test provided by @maarquitos14 (though I'm having troubles to get a similar IR in my experiments).
s-perron
left a comment
There was a problem hiding this comment.
I'm okay with this change. The only potential problem I see is that changing the size of the type could have other unintended consequences. I don't know if it might be more robust to change the type in the FE. This will make sure that any optimizations that rely on knowing the layout and size of a variable will be correct.
| @@ -0,0 +1,22 @@ | |||
| ; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s --check-prefix=CHECK-KERNEL | |||
| ; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %} | |||
There was a problem hiding this comment.
For the shader test cases, I'm trying to get people to add the target environment to the call to spirv-val. That way, you can get more specific validation. Is that worth doing for opencl tests as well?
| ; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %} | |
| ; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val --target-env opencl2.0 %} |
| if (NumElems != 0 || !ST.isShader()) { | ||
| Register NumElementsVReg = buildConstantInt( | ||
| NumElems ? NumElems : 1, MIRBuilder, SpvTypeInt32, EmitIR); |
There was a problem hiding this comment.
I don't know OpenCL, but is it possible to have a zero sized array in a struct? Will making it an array of size 1 change the layout of the struct? Would it be better for the FE to change the type to an array of size 1 to avoid potential issue like that?
|
I moved the GEP simplification to |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
s-perron
left a comment
There was a problem hiding this comment.
LGTM, I will be off for a while, so don't wait for me to merge. My only issue are the mixing of Vulkan and other addressing models. In general, we should not be using Addresses when targeting Vulkan. Don't add tests that expect that to work. Note that is different than creating a general shader, which can use Addresses.
...xtensions/SPV_KHR_bit_instructions/cl_khr_extended_bit_ops_spv-friendly_only_no_extension.ll
Show resolved
Hide resolved
| @@ -1,4 +1,4 @@ | |||
| ; RUN: llc -O0 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s --check-prefix=SPV | |||
| ; RUN: llc -O0 -mtriple=spirv32-unknown-vulkan %s -o - | FileCheck %s --check-prefix=SPV | |||
There was a problem hiding this comment.
Mixing Physical addressing and shaders is fine as long as we are not targeting Vulkan.
Also Simple is deprecated. We might want to start phasing that out, but that is for another PR.
| ; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-vulkan %s -o - | FileCheck %s | ||
| ; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-vulkan %s -o - -filetype=obj | spirv-val %} |
There was a problem hiding this comment.
If you are targeting vulkan, add -target-env vulkan1.3 (or some other version) to make sure we generate code that is valid for Vulkan.
| call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %var) | ||
| call void @llvm.memcpy.p0.p0.i64(ptr align 8 %var, ptr align 8 %_arg, i64 1, i1 false) | ||
| %KernelFunc = getelementptr inbounds i8, ptr %var, i64 0 | ||
| %KernelFunc = getelementptr inbounds i8, ptr %var, i64 1 |
There was a problem hiding this comment.
I had to change index from 0 to 1, because otherwise simplifyGEPInst was optimizing it out. Let me know if that works.
Keenuts
left a comment
There was a problem hiding this comment.
2 run lines to change, otherwise OK to move this forward
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/21027 Here is the relevant piece of the build log for the reference |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/11332 Here is the relevant piece of the build log for the reference |
Prior to this patch, when
NumElemswas 0,OpTypeRuntimeArraywas directly generated, but it requiresShadercapability, so it can only be generated ifShaderenv is being used. We have observed a pattern of using unbound arrays that translate into[0 x ...]types in OpenCL, which impliesKernelcapability, soOpTypeRuntimeArrayshould not be used. To prevent this scenario, this patch simplifies GEP instructions where type is a 0-length array and the first index is also 0. In such scenario, we effectively drop the 0-length array and the first index.Additionally, the newly added test prior to this patch was generating a module with both
ShaderandKernelcapabilities at the same time, but they're incompatible. This patch also fixes that.Finally, prior to this patch, the newly added test was adding
Shadercapability to the module even with the command line flag--avoid-spirv-capabilities=Shader. This patch also has a fix for that.