Conversation
… like functions to whitelist
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
Needs more elaboration on the reason and new test that shows the difference |
|
@arsenm previous comment updated with more explanation, and new LIT test (llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll) was already added. LMK what you think. Thanks. |
| /// Returns true if F looks like a leaf-style trap intrinsic that cannot | ||
| /// possibly callback into user code. |
There was a problem hiding this comment.
Traps can call back into user code. The default legalize action is to call the abort function, which may be defined in the module. The AMDGPU legalization just happens to not use this.
\With the trap-func-name, there can be a trap function which could be defined in the current module. If you set that, you'll get the call to that function still
| case Intrinsic::debugtrap: | ||
| // llvm.debugtrap currently lacks the attributes required for | ||
| // isTrapLikeLeafIntrinsic, so kept explicitly whitelisted. | ||
| return true; |
There was a problem hiding this comment.
This shall not be special cased. The attributes for debug trap should probably be fixed separately
| if (isTrapLikeLeafIntrinsic(*CalledFunc)) | ||
| return true; |
There was a problem hiding this comment.
This should not be special cased
| ; nocallback attribute, so the AMDGPU attributor used to conservatively drop | ||
| ; all implicitly-known inputs and AGPR allocation information. Make sure we | ||
| ; still infer that no implicit inputs are required and that the AGPR allocation | ||
| ; stays at zero. |
There was a problem hiding this comment.
Something else has to be going on here, because the report was that a test was "broken". Treating traps overly more conservatively than necessary should not be able to break anything
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
| @@ -35,7 +35,11 @@ define void @use_assume(i1 %arg) { | |||
|
|
|||
| define void @use_trap() { | |||
| ; CHECK-LABEL: define void @use_trap( | |||
| <<<<<<< HEAD | |||
| if (isTrapLikeLeafIntrinsic(*CalledFunc)) { | ||
| if (CB.hasFnAttr("trap-func-name")) | ||
| return CB.hasFnAttr(Attribute::NoCallback); | ||
| return true; |
There was a problem hiding this comment.
Can you make this a straight reapply and do this new trap stuff separately
There was a problem hiding this comment.
The reapply already landed (#176081). This just adds handling for the trap intrinsics. Changed the name and description to better reflect that.
|
@llvm/pr-subscribers-backend-amdgpu Author: Akash Dutta (akadutta) ChangesThis adds support to whitelist trap intrinsics while handling of intrinsics with !nocallback. This fixes the reasons behind the previous revert of #131759. The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to automatically recognize leaf-like trap intrinsics via their noreturn/inaccessiblemem attributes, so they no longer poison the analysis. Patch is 21.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175230.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 0b2ee6371da06..a257f9db9dfd6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -141,6 +141,26 @@ static bool hasSanitizerAttributes(const Function &F) {
F.hasFnAttribute(Attribute::SanitizeMemTag);
}
+/// Returns true if F looks like a leaf-style trap intrinsic that cannot
+/// possibly callback into user code.
+static bool isTrapLikeLeafIntrinsic(const Function &F) {
+ if (!F.isIntrinsic())
+ return false;
+
+ // If we already know this intrinsic is nocallback, there is nothing left to
+ // classify.
+ if (F.hasFnAttribute(Attribute::NoCallback))
+ return true;
+
+ if (!F.hasFnAttribute(Attribute::NoReturn))
+ return false;
+
+ if (F.doesNotAccessMemory())
+ return true;
+
+ return F.onlyAccessesInaccessibleMemory();
+}
+
namespace {
class AMDGPUInformationCache : public InformationCache {
public:
@@ -496,6 +516,17 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
// The current assumed state used to determine a change.
auto OrigAssumed = getAssumed();
+ SmallPtrSet<const Function *, 4> TrapHandlers;
+ for (Instruction &I : instructions(F)) {
+ auto *CB = dyn_cast<CallBase>(&I);
+ if (!CB || !CB->hasFnAttr("trap-func-name"))
+ continue;
+
+ if (const Function *CalledFunc = CB->getCalledFunction())
+ if (isTrapLikeLeafIntrinsic(*CalledFunc))
+ TrapHandlers.insert(CalledFunc);
+ }
+
// Check for Intrinsics and propagate attributes.
const AACallEdges *AAEdges = A.getAAFor<AACallEdges>(
*this, this->getIRPosition(), DepClassTy::REQUIRED);
@@ -527,6 +558,12 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
intrinsicToAttrMask(IID, NonKernelOnly, NeedsImplicit,
HasApertureRegs, SupportsGetDoorbellID, COV);
+ if (TrapHandlers.contains(Callee)) {
+ if (!Callee->hasFnAttribute(Attribute::NoCallback))
+ return indicatePessimisticFixpoint();
+ continue;
+ }
+
if (AttrMask == UNKNOWN_INTRINSIC) {
// Assume not-nocallback intrinsics may invoke a function which accesses
// implicit arguments.
@@ -536,7 +573,8 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
// of whether it's internal to the module or not.
//
// TODO: Ignoring callsite attributes.
- if (!Callee->hasFnAttribute(Attribute::NoCallback))
+ if (!isTrapLikeLeafIntrinsic(*Callee) &&
+ !Callee->hasFnAttribute(Attribute::NoCallback))
return indicatePessimisticFixpoint();
continue;
}
@@ -1344,6 +1382,14 @@ struct AAAMDGPUMinAGPRAlloc
return true;
}
+ if (const Function *CalledFunc = CB.getCalledFunction()) {
+ if (isTrapLikeLeafIntrinsic(*CalledFunc)) {
+ if (CB.hasFnAttr("trap-func-name"))
+ return CB.hasFnAttr(Attribute::NoCallback);
+ return true;
+ }
+ }
+
switch (CB.getIntrinsicID()) {
case Intrinsic::not_intrinsic:
break;
@@ -1364,7 +1410,6 @@ struct AAAMDGPUMinAGPRAlloc
default:
// Some intrinsics may use AGPRs, but if we have a choice, we are not
// required to use AGPRs.
-
// Assume !nocallback intrinsics may call a function which requires
// AGPRs.
return CB.hasFnAttr(Attribute::NoCallback);
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
index 4db668e05cb21..3684f0df77780 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
@@ -181,7 +181,7 @@ define amdgpu_kernel void @kernel_calls_extern() {
define amdgpu_kernel void @kernel_calls_extern_marked_callsite() {
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_extern_marked_callsite(
; CHECK-SAME: ) #[[ATTR3]] {
-; CHECK-NEXT: call void @unknown() #[[ATTR29:[0-9]+]]
+; CHECK-NEXT: call void @unknown() #[[ATTR27:[0-9]+]]
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
;
@@ -205,7 +205,7 @@ define amdgpu_kernel void @kernel_calls_indirect(ptr %indirect) {
define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(ptr %indirect) {
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(
; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT: call void [[INDIRECT]]() #[[ATTR29]]
+; CHECK-NEXT: call void [[INDIRECT]]() #[[ATTR27]]
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
;
@@ -724,7 +724,7 @@ define amdgpu_kernel void @kernel_uses_write_register_v55() {
define amdgpu_kernel void @kernel_uses_write_register_a55_57() {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_write_register_a55_57(
-; CHECK-SAME: ) #[[ATTR18:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR0]] {
; CHECK-NEXT: call void @llvm.write_register.i96(metadata [[META2:![0-9]+]], i96 0)
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -736,7 +736,7 @@ define amdgpu_kernel void @kernel_uses_write_register_a55_57() {
define amdgpu_kernel void @kernel_uses_read_register_a55(ptr addrspace(1) %ptr) {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_register_a55(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR19:[0-9]+]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[REG:%.*]] = call i32 @llvm.read_register.i32(metadata [[META0]])
; CHECK-NEXT: store i32 [[REG]], ptr addrspace(1) [[PTR]], align 4
; CHECK-NEXT: call void @use_most()
@@ -750,7 +750,7 @@ define amdgpu_kernel void @kernel_uses_read_register_a55(ptr addrspace(1) %ptr)
define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(ptr addrspace(1) %ptr) {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR19]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR18:[0-9]+]] {
; CHECK-NEXT: [[REG:%.*]] = call i32 @llvm.read_volatile_register.i32(metadata [[META0]])
; CHECK-NEXT: store i32 [[REG]], ptr addrspace(1) [[PTR]], align 4
; CHECK-NEXT: call void @use_most()
@@ -764,7 +764,7 @@ define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(ptr addrspace(
define amdgpu_kernel void @kernel_uses_read_register_a56_59(ptr addrspace(1) %ptr) {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_register_a56_59(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR20:[0-9]+]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[REG:%.*]] = call i128 @llvm.read_register.i128(metadata [[META3:![0-9]+]])
; CHECK-NEXT: store i128 [[REG]], ptr addrspace(1) [[PTR]], align 8
; CHECK-NEXT: call void @use_most()
@@ -778,7 +778,7 @@ define amdgpu_kernel void @kernel_uses_read_register_a56_59(ptr addrspace(1) %pt
define amdgpu_kernel void @kernel_uses_write_register_out_of_bounds_a256() {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_write_register_out_of_bounds_a256(
-; CHECK-SAME: ) #[[ATTR9]] {
+; CHECK-SAME: ) #[[ATTR0]] {
; CHECK-NEXT: call void @llvm.write_register.i32(metadata [[META4:![0-9]+]], i32 0)
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -897,7 +897,7 @@ define void @kernel_max_callgraph(i1 %cond) {
define amdgpu_kernel void @kernel_uses_all_virtregs() #1 {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_all_virtregs(
-; CHECK-SAME: ) #[[ATTR21:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR19:[0-9]+]] {
; CHECK-NEXT: call void asm sideeffect "
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -909,7 +909,7 @@ define amdgpu_kernel void @kernel_uses_all_virtregs() #1 {
define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1() #1 {
; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1(
-; CHECK-SAME: ) #[[ATTR21]] {
+; CHECK-SAME: ) #[[ATTR19]] {
; CHECK-NEXT: call void asm sideeffect "
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -921,7 +921,7 @@ define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1() #1 {
define void @recursive() {
; CHECK-LABEL: define void @recursive(
-; CHECK-SAME: ) #[[ATTR22:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR20:[0-9]+]] {
; CHECK-NEXT: call void asm sideeffect "
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: call void @recursive()
@@ -935,7 +935,7 @@ define void @recursive() {
define void @indirect_0() {
; CHECK-LABEL: define void @indirect_0(
-; CHECK-SAME: ) #[[ATTR22]] {
+; CHECK-SAME: ) #[[ATTR20]] {
; CHECK-NEXT: call void asm sideeffect "
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -947,7 +947,7 @@ define void @indirect_0() {
define void @indirect_1() {
; CHECK-LABEL: define void @indirect_1(
-; CHECK-SAME: ) #[[ATTR23:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR21:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = call <3 x i32> asm sideeffect "
; CHECK-NEXT: call void @use_most()
; CHECK-NEXT: ret void
@@ -959,7 +959,7 @@ define void @indirect_1() {
define amdgpu_kernel void @knowable_indirect_call(i1 %cond) {
; CHECK-LABEL: define amdgpu_kernel void @knowable_indirect_call(
-; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR22]] {
+; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR20]] {
; CHECK-NEXT: [[FPTR:%.*]] = select i1 [[COND]], ptr @indirect_0, ptr @indirect_1
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq ptr [[FPTR]], @indirect_1
; CHECK-NEXT: br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP3:%.*]]
@@ -1044,19 +1044,17 @@ attributes #1 = { "amdgpu-waves-per-eu"="1,1" }
; CHECK: attributes #[[ATTR14]] = { "amdgpu-agpr-alloc"="33" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR15]] = { "amdgpu-agpr-alloc"="8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR16]] = { "amdgpu-agpr-alloc"="13" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR17]] = { "amdgpu-agpr-alloc"="56" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR18]] = { "amdgpu-agpr-alloc"="58" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR19]] = { "amdgpu-agpr-alloc"="56" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR20]] = { "amdgpu-agpr-alloc"="60" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR21]] = { "amdgpu-agpr-alloc"="256" "amdgpu-waves-per-eu"="1,1" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR22]] = { "amdgpu-agpr-alloc"="7" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR23]] = { "amdgpu-agpr-alloc"="3" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR24:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR25:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR26:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR27:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR28:[0-9]+]] = { nocallback nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR29]] = { "amdgpu-agpr-alloc"="0" }
+; CHECK: attributes #[[ATTR17]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR18]] = { "amdgpu-agpr-alloc"="56" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR19]] = { "amdgpu-agpr-alloc"="256" "amdgpu-waves-per-eu"="1,1" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR20]] = { "amdgpu-agpr-alloc"="7" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR21]] = { "amdgpu-agpr-alloc"="3" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR22:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR23:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR24:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR25:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR26:[0-9]+]] = { nocallback nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR27]] = { "amdgpu-agpr-alloc"="0" }
;.
; CHECK: [[META0]] = !{!"a55"}
; CHECK: [[META1]] = !{!"v55"}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
index 71c509afa8e64..bd723f696558b 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
@@ -35,7 +35,7 @@ define void @use_assume(i1 %arg) {
define void @use_trap() {
; CHECK-LABEL: define void @use_trap(
-; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR0]] {
; CHECK-NEXT: call void @llvm.trap()
; CHECK-NEXT: ret void
;
@@ -43,9 +43,19 @@ define void @use_trap() {
ret void
}
+define void @use_trap_with_handler() {
+; CHECK-LABEL: define void @use_trap_with_handler(
+; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT: call void @llvm.trap() #[[ATTR8:[0-9]+]]
+; CHECK-NEXT: ret void
+;
+ call void @llvm.trap() #0
+ ret void
+}
+
define void @use_debugtrap() {
; CHECK-LABEL: define void @use_debugtrap(
-; CHECK-SAME: ) #[[ATTR1]] {
+; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
; CHECK-NEXT: call void @llvm.debugtrap()
; CHECK-NEXT: ret void
;
@@ -55,7 +65,7 @@ define void @use_debugtrap() {
define void @use_ubsantrap() {
; CHECK-LABEL: define void @use_ubsantrap(
-; CHECK-SAME: ) #[[ATTR1]] {
+; CHECK-SAME: ) #[[ATTR0]] {
; CHECK-NEXT: call void @llvm.ubsantrap(i8 0)
; CHECK-NEXT: ret void
;
@@ -63,12 +73,16 @@ define void @use_ubsantrap() {
ret void
}
+
+attributes #0 = { "trap-func-name"="handler" }
;.
; CHECK: attributes #[[ATTR0]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR3:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR6:[0-9]+]] = { cold noreturn nounwind memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR1]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR4:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR6:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR7:[0-9]+]] = { cold noreturn nounwind memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR8]] = { "trap-func-name"="handler" }
;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll
new file mode 100644
index 0000000000000..0010b7ba3e28c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll
@@ -0,0 +1,53 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals all --version 6
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -passes=amdgpu-attributor %s | FileCheck %s
+
+; Trap-like intrinsics such as llvm.trap and llvm.debugtrap do not have the
+; nocallback attribute, so the AMDGPU attributor used to conservatively drop
+; all implicitly-known inputs and AGPR allocation information. Make sure we
+; still infer that no implicit inputs are required and that the AGPR allocation
+; stays at zero.
+
+declare void @llvm.trap()
+
+declare void @llvm.debugtrap()
+
+define amdgpu_kernel void @trap_kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @trap_kernel(
+; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
+; CHECK-NEXT: call void @llvm.trap()
+; CHECK-NEXT: ret void
+;
+ call void @llvm.trap()
+ ret void
+}
+
+define amdgpu_kernel void @trap_kernel_with_handler() {
+; CHECK-LABEL: define amdgpu_kernel void @trap_kernel_with_handler(
+; CHECK-SAME: ) #[[ATTR3:[0-9]+]] {
+; CHECK-NEXT: call void @llvm.trap() #[[ATTR5:[0-9]+]]
+; CHECK-NEXT: ret void
+;
+ call void @llvm.trap() #0
+ ret void
+}
+
+define amdgpu_kernel void @debugtrap_kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @debugtrap_kernel(
+; CHECK-SAME: ) #[[ATTR4:[0-9]+]] {
+; CHECK-NEXT: call void @llvm.debugtrap()
+; CHECK-NEXT: ...
[truncated]
|
| // If we already know this intrinsic is nocallback, there is nothing left to | ||
| // classify. | ||
| if (F.hasFnAttribute(Attribute::NoCallback)) | ||
| return true; |
There was a problem hiding this comment.
This would have been covered by the check in the caller, this function doesn't need to re-check that it's an intrinsic and no callback
| if (!F.hasFnAttribute(Attribute::NoReturn)) | ||
| return false; | ||
|
|
||
| if (F.doesNotAccessMemory()) | ||
| return true; | ||
|
|
||
| return F.onlyAccessesInaccessibleMemory(); |
There was a problem hiding this comment.
Don't think these properties are really relevant, it matters if it invokes a call which could use AGPRs
| SmallPtrSet<const Function *, 4> TrapHandlers; | ||
| for (Instruction &I : instructions(F)) { | ||
| auto *CB = dyn_cast<CallBase>(&I); | ||
| if (!CB || !CB->hasFnAttr("trap-func-name")) |
There was a problem hiding this comment.
I wonder if attributor can / should handle this case in the call graph analysis
| // The current assumed state used to determine a change. | ||
| auto OrigAssumed = getAssumed(); | ||
|
|
||
| SmallPtrSet<const Function *, 4> TrapHandlers; |
There was a problem hiding this comment.
Shouldn't really need to collect a set of these? It's unlikely there will be many
|
|
||
| /// Returns true if F looks like a leaf-style trap intrinsic that cannot | ||
| /// possibly callback into user code. | ||
| static bool isTrapLikeLeafIntrinsic(const Function &F) { |
There was a problem hiding this comment.
Given the specialness of traps calling a single possible function, it probably does make sense to just special case those. The AMDGPU specific logic is that llvm.trap will never call a function, unless it has the special call attribute. The not-AMDGPU default case could call "abort"
There was a problem hiding this comment.
Do you mean replace these changes with something simple like this?
case Intrinsic::trap:
case Intrinsic::debugtrap:
case Intrinsic::ubsantrap:
if (CB.hasFnAttr("trap-func-name"))
return CB.hasFnAttr(Attribute::NoCallback);
return true;
There was a problem hiding this comment.
Yes, something like that (maybe invert the order, The NoCallback check takes precedent over the trap-func-name)
|
The more I think about it, the more this "trap-func-name" is a busted feature. It's not in the LangRef, and the symbol isn't required to be defined in the IR. Nothing is going to ensure this function is kept alive during LTO. What is the point of this? If you wanted a call to that specific function, you could have just emitted a call that function? Is anything really using this? |
|
Handling the "trap-func-name" is mostly to cover all bases. The main idea is to prevent early exit from the min AGPR analysis leading to the removal of "amdgpu-agpr-alloc"="0" attributes from llvm.trap. As you had rightly pointed out before, trap instrinsics allow callback into user code through "trap-func-name" and should be handled the same way as any other function/intrinsic without NoCallback. |
|
Right, we're just handling this broken thing that as already here. But you should still simplify this into a special case over trap intrinsics without nocallback |
| return true; | ||
| } | ||
| } | ||
| // if (const Function *CalledFunc = CB.getCalledFunction()) { |
| case Intrinsic::trap: | ||
| case Intrinsic::debugtrap: | ||
| case Intrinsic::ubsantrap: | ||
| return !CB.hasFnAttr("trap-func-name"); |
There was a problem hiding this comment.
Should check callsite nocallback too
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/21642 Here is the relevant piece of the build log for the reference |
…allback (llvm#175230) This adds support to whitelist trap intrinsics while handling of intrinsics with !nocallback. This fixes the reasons behind the previous revert of llvm#131759. The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to recognize leaf-like trap intrinsics, so they no longer poison the analysis. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This adds support to whitelist trap intrinsics while handling of intrinsics with !nocallback. This fixes the reasons behind the previous revert of #131759.
The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to automatically recognize leaf-like trap intrinsics via their noreturn/inaccessiblemem attributes, so they no longer poison the analysis.