Skip to content

AMDGPU: Add support for llvm.trap to handling of intrinsics with !nocallback#175230

Merged
akadutta merged 13 commits intollvm:mainfrom
akadutta:amdgpu/attributor-no-infer-inputs-missing-nocallback
Jan 30, 2026
Merged

AMDGPU: Add support for llvm.trap to handling of intrinsics with !nocallback#175230
akadutta merged 13 commits intollvm:mainfrom
akadutta:amdgpu/attributor-no-infer-inputs-missing-nocallback

Conversation

@akadutta
Copy link
Contributor

@akadutta akadutta commented Jan 9, 2026

This adds support to whitelist trap intrinsics while handling of intrinsics with !nocallback. This fixes the reasons behind the previous revert of #131759.

The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to automatically recognize leaf-like trap intrinsics via their noreturn/inaccessiblemem attributes, so they no longer poison the analysis.

@akadutta akadutta requested a review from arsenm January 9, 2026 19:58
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm
Copy link
Contributor

arsenm commented Jan 9, 2026

Needs more elaboration on the reason and new test that shows the difference

@akadutta
Copy link
Contributor Author

akadutta commented Jan 9, 2026

@arsenm previous comment updated with more explanation, and new LIT test (llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll) was already added. LMK what you think. Thanks.

Comment on lines 144 to 145
/// Returns true if F looks like a leaf-style trap intrinsic that cannot
/// possibly callback into user code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traps can call back into user code. The default legalize action is to call the abort function, which may be defined in the module. The AMDGPU legalization just happens to not use this.

\With the trap-func-name, there can be a trap function which could be defined in the current module. If you set that, you'll get the call to that function still

Comment on lines 1385 to 1388
case Intrinsic::debugtrap:
// llvm.debugtrap currently lacks the attributes required for
// isTrapLikeLeafIntrinsic, so kept explicitly whitelisted.
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shall not be special cased. The attributes for debug trap should probably be fixed separately

Comment on lines 1394 to 1395
if (isTrapLikeLeafIntrinsic(*CalledFunc))
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be special cased

; nocallback attribute, so the AMDGPU attributor used to conservatively drop
; all implicitly-known inputs and AGPR allocation information. Make sure we
; still infer that no implicit inputs are required and that the AGPR allocation
; stays at zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else has to be going on here, because the report was that a test was "broken". Treating traps overly more conservatively than necessary should not be able to break anything

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

🐧 Linux x64 Test Results

  • 189392 tests passed
  • 5031 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

🪟 Windows x64 Test Results

  • 130293 tests passed
  • 2890 tests skipped

✅ The build succeeded and all tests passed.

@@ -35,7 +35,11 @@ define void @use_assume(i1 %arg) {

define void @use_trap() {
; CHECK-LABEL: define void @use_trap(
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict markers

Comment on lines 1387 to 1390
if (isTrapLikeLeafIntrinsic(*CalledFunc)) {
if (CB.hasFnAttr("trap-func-name"))
return CB.hasFnAttr(Attribute::NoCallback);
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a straight reapply and do this new trap stuff separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reapply already landed (#176081). This just adds handling for the trap intrinsics. Changed the name and description to better reflect that.

@akadutta akadutta changed the title AMDGPU: Do not infer implicit inputs for intrinsics with !nocallback AMDGPU: Add support for llvm.trap to handling of intrinsics with !nocallback Jan 21, 2026
@akadutta akadutta marked this pull request as ready for review January 21, 2026 21:11
@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2026

@llvm/pr-subscribers-backend-amdgpu

Author: Akash Dutta (akadutta)

Changes

This adds support to whitelist trap intrinsics while handling of intrinsics with !nocallback. This fixes the reasons behind the previous revert of #131759.

The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to automatically recognize leaf-like trap intrinsics via their noreturn/inaccessiblemem attributes, so they no longer poison the analysis.


Patch is 21.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175230.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+47-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll (+24-26)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll (+23-9)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll (+53)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 0b2ee6371da06..a257f9db9dfd6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -141,6 +141,26 @@ static bool hasSanitizerAttributes(const Function &F) {
          F.hasFnAttribute(Attribute::SanitizeMemTag);
 }
 
+/// Returns true if F looks like a leaf-style trap intrinsic that cannot
+/// possibly callback into user code.
+static bool isTrapLikeLeafIntrinsic(const Function &F) {
+  if (!F.isIntrinsic())
+    return false;
+
+  // If we already know this intrinsic is nocallback, there is nothing left to
+  // classify.
+  if (F.hasFnAttribute(Attribute::NoCallback))
+    return true;
+
+  if (!F.hasFnAttribute(Attribute::NoReturn))
+    return false;
+
+  if (F.doesNotAccessMemory())
+    return true;
+
+  return F.onlyAccessesInaccessibleMemory();
+}
+
 namespace {
 class AMDGPUInformationCache : public InformationCache {
 public:
@@ -496,6 +516,17 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
     // The current assumed state used to determine a change.
     auto OrigAssumed = getAssumed();
 
+    SmallPtrSet<const Function *, 4> TrapHandlers;
+    for (Instruction &I : instructions(F)) {
+      auto *CB = dyn_cast<CallBase>(&I);
+      if (!CB || !CB->hasFnAttr("trap-func-name"))
+        continue;
+
+      if (const Function *CalledFunc = CB->getCalledFunction())
+        if (isTrapLikeLeafIntrinsic(*CalledFunc))
+          TrapHandlers.insert(CalledFunc);
+    }
+
     // Check for Intrinsics and propagate attributes.
     const AACallEdges *AAEdges = A.getAAFor<AACallEdges>(
         *this, this->getIRPosition(), DepClassTy::REQUIRED);
@@ -527,6 +558,12 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
           intrinsicToAttrMask(IID, NonKernelOnly, NeedsImplicit,
                               HasApertureRegs, SupportsGetDoorbellID, COV);
 
+      if (TrapHandlers.contains(Callee)) {
+        if (!Callee->hasFnAttribute(Attribute::NoCallback))
+          return indicatePessimisticFixpoint();
+        continue;
+      }
+
       if (AttrMask == UNKNOWN_INTRINSIC) {
         // Assume not-nocallback intrinsics may invoke a function which accesses
         // implicit arguments.
@@ -536,7 +573,8 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
         // of whether it's internal to the module or not.
         //
         // TODO: Ignoring callsite attributes.
-        if (!Callee->hasFnAttribute(Attribute::NoCallback))
+        if (!isTrapLikeLeafIntrinsic(*Callee) &&
+            !Callee->hasFnAttribute(Attribute::NoCallback))
           return indicatePessimisticFixpoint();
         continue;
       }
@@ -1344,6 +1382,14 @@ struct AAAMDGPUMinAGPRAlloc
         return true;
       }
 
+      if (const Function *CalledFunc = CB.getCalledFunction()) {
+        if (isTrapLikeLeafIntrinsic(*CalledFunc)) {
+          if (CB.hasFnAttr("trap-func-name"))
+            return CB.hasFnAttr(Attribute::NoCallback);
+          return true;
+        }
+      }
+
       switch (CB.getIntrinsicID()) {
       case Intrinsic::not_intrinsic:
         break;
@@ -1364,7 +1410,6 @@ struct AAAMDGPUMinAGPRAlloc
       default:
         // Some intrinsics may use AGPRs, but if we have a choice, we are not
         // required to use AGPRs.
-
         // Assume !nocallback intrinsics may call a function which requires
         // AGPRs.
         return CB.hasFnAttr(Attribute::NoCallback);
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
index 4db668e05cb21..3684f0df77780 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
@@ -181,7 +181,7 @@ define amdgpu_kernel void @kernel_calls_extern() {
 define amdgpu_kernel void @kernel_calls_extern_marked_callsite() {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_extern_marked_callsite(
 ; CHECK-SAME: ) #[[ATTR3]] {
-; CHECK-NEXT:    call void @unknown() #[[ATTR29:[0-9]+]]
+; CHECK-NEXT:    call void @unknown() #[[ATTR27:[0-9]+]]
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
 ;
@@ -205,7 +205,7 @@ define amdgpu_kernel void @kernel_calls_indirect(ptr %indirect) {
 define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(ptr %indirect) {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(
 ; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT:    call void [[INDIRECT]]() #[[ATTR29]]
+; CHECK-NEXT:    call void [[INDIRECT]]() #[[ATTR27]]
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
 ;
@@ -724,7 +724,7 @@ define amdgpu_kernel void @kernel_uses_write_register_v55() {
 
 define amdgpu_kernel void @kernel_uses_write_register_a55_57() {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_write_register_a55_57(
-; CHECK-SAME: ) #[[ATTR18:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    call void @llvm.write_register.i96(metadata [[META2:![0-9]+]], i96 0)
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -736,7 +736,7 @@ define amdgpu_kernel void @kernel_uses_write_register_a55_57() {
 
 define amdgpu_kernel void @kernel_uses_read_register_a55(ptr addrspace(1) %ptr) {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_register_a55(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR19:[0-9]+]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[REG:%.*]] = call i32 @llvm.read_register.i32(metadata [[META0]])
 ; CHECK-NEXT:    store i32 [[REG]], ptr addrspace(1) [[PTR]], align 4
 ; CHECK-NEXT:    call void @use_most()
@@ -750,7 +750,7 @@ define amdgpu_kernel void @kernel_uses_read_register_a55(ptr addrspace(1) %ptr)
 
 define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(ptr addrspace(1) %ptr) {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR19]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR18:[0-9]+]] {
 ; CHECK-NEXT:    [[REG:%.*]] = call i32 @llvm.read_volatile_register.i32(metadata [[META0]])
 ; CHECK-NEXT:    store i32 [[REG]], ptr addrspace(1) [[PTR]], align 4
 ; CHECK-NEXT:    call void @use_most()
@@ -764,7 +764,7 @@ define amdgpu_kernel void @kernel_uses_read_volatile_register_a55(ptr addrspace(
 
 define amdgpu_kernel void @kernel_uses_read_register_a56_59(ptr addrspace(1) %ptr) {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_read_register_a56_59(
-; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR20:[0-9]+]] {
+; CHECK-SAME: ptr addrspace(1) [[PTR:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[REG:%.*]] = call i128 @llvm.read_register.i128(metadata [[META3:![0-9]+]])
 ; CHECK-NEXT:    store i128 [[REG]], ptr addrspace(1) [[PTR]], align 8
 ; CHECK-NEXT:    call void @use_most()
@@ -778,7 +778,7 @@ define amdgpu_kernel void @kernel_uses_read_register_a56_59(ptr addrspace(1) %pt
 
 define amdgpu_kernel void @kernel_uses_write_register_out_of_bounds_a256() {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_write_register_out_of_bounds_a256(
-; CHECK-SAME: ) #[[ATTR9]] {
+; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    call void @llvm.write_register.i32(metadata [[META4:![0-9]+]], i32 0)
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -897,7 +897,7 @@ define void @kernel_max_callgraph(i1 %cond) {
 
 define amdgpu_kernel void @kernel_uses_all_virtregs() #1 {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_all_virtregs(
-; CHECK-SAME: ) #[[ATTR21:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR19:[0-9]+]] {
 ; CHECK-NEXT:    call void asm sideeffect "
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -909,7 +909,7 @@ define amdgpu_kernel void @kernel_uses_all_virtregs() #1 {
 
 define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1() #1 {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1(
-; CHECK-SAME: ) #[[ATTR21]] {
+; CHECK-SAME: ) #[[ATTR19]] {
 ; CHECK-NEXT:    call void asm sideeffect "
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -921,7 +921,7 @@ define amdgpu_kernel void @kernel_uses_all_virtregs_plus_1() #1 {
 
 define void @recursive() {
 ; CHECK-LABEL: define void @recursive(
-; CHECK-SAME: ) #[[ATTR22:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR20:[0-9]+]] {
 ; CHECK-NEXT:    call void asm sideeffect "
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    call void @recursive()
@@ -935,7 +935,7 @@ define void @recursive() {
 
 define void @indirect_0() {
 ; CHECK-LABEL: define void @indirect_0(
-; CHECK-SAME: ) #[[ATTR22]] {
+; CHECK-SAME: ) #[[ATTR20]] {
 ; CHECK-NEXT:    call void asm sideeffect "
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -947,7 +947,7 @@ define void @indirect_0() {
 
 define void @indirect_1() {
 ; CHECK-LABEL: define void @indirect_1(
-; CHECK-SAME: ) #[[ATTR23:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR21:[0-9]+]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = call <3 x i32> asm sideeffect "
 ; CHECK-NEXT:    call void @use_most()
 ; CHECK-NEXT:    ret void
@@ -959,7 +959,7 @@ define void @indirect_1() {
 
 define amdgpu_kernel void @knowable_indirect_call(i1 %cond) {
 ; CHECK-LABEL: define amdgpu_kernel void @knowable_indirect_call(
-; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR22]] {
+; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR20]] {
 ; CHECK-NEXT:    [[FPTR:%.*]] = select i1 [[COND]], ptr @indirect_0, ptr @indirect_1
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq ptr [[FPTR]], @indirect_1
 ; CHECK-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP3:%.*]]
@@ -1044,19 +1044,17 @@ attributes #1 = { "amdgpu-waves-per-eu"="1,1" }
 ; CHECK: attributes #[[ATTR14]] = { "amdgpu-agpr-alloc"="33" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR15]] = { "amdgpu-agpr-alloc"="8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR16]] = { "amdgpu-agpr-alloc"="13" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR17]] = { "amdgpu-agpr-alloc"="56" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR18]] = { "amdgpu-agpr-alloc"="58" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR19]] = { "amdgpu-agpr-alloc"="56" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR20]] = { "amdgpu-agpr-alloc"="60" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR21]] = { "amdgpu-agpr-alloc"="256" "amdgpu-waves-per-eu"="1,1" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR22]] = { "amdgpu-agpr-alloc"="7" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR23]] = { "amdgpu-agpr-alloc"="3" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR24:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR25:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR26:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR27:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR28:[0-9]+]] = { nocallback nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR29]] = { "amdgpu-agpr-alloc"="0" }
+; CHECK: attributes #[[ATTR17]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR18]] = { "amdgpu-agpr-alloc"="56" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR19]] = { "amdgpu-agpr-alloc"="256" "amdgpu-waves-per-eu"="1,1" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR20]] = { "amdgpu-agpr-alloc"="7" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR21]] = { "amdgpu-agpr-alloc"="3" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR22:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR23:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR24:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR25:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR26:[0-9]+]] = { nocallback nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR27]] = { "amdgpu-agpr-alloc"="0" }
 ;.
 ; CHECK: [[META0]] = !{!"a55"}
 ; CHECK: [[META1]] = !{!"v55"}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
index 71c509afa8e64..bd723f696558b 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
@@ -35,7 +35,7 @@ define void @use_assume(i1 %arg) {
 
 define void @use_trap() {
 ; CHECK-LABEL: define void @use_trap(
-; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    call void @llvm.trap()
 ; CHECK-NEXT:    ret void
 ;
@@ -43,9 +43,19 @@ define void @use_trap() {
   ret void
 }
 
+define void @use_trap_with_handler() {
+; CHECK-LABEL: define void @use_trap_with_handler(
+; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.trap() #[[ATTR8:[0-9]+]]
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.trap() #0
+  ret void
+}
+
 define void @use_debugtrap() {
 ; CHECK-LABEL: define void @use_debugtrap(
-; CHECK-SAME: ) #[[ATTR1]] {
+; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
 ; CHECK-NEXT:    call void @llvm.debugtrap()
 ; CHECK-NEXT:    ret void
 ;
@@ -55,7 +65,7 @@ define void @use_debugtrap() {
 
 define void @use_ubsantrap() {
 ; CHECK-LABEL: define void @use_ubsantrap(
-; CHECK-SAME: ) #[[ATTR1]] {
+; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    call void @llvm.ubsantrap(i8 0)
 ; CHECK-NEXT:    ret void
 ;
@@ -63,12 +73,16 @@ define void @use_ubsantrap() {
   ret void
 }
 
+
+attributes #0 = { "trap-func-name"="handler" }
 ;.
 ; CHECK: attributes #[[ATTR0]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR3:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR6:[0-9]+]] = { cold noreturn nounwind memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR1]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-cluster-id-x" "amdgpu-no-cluster-id-y" "amdgpu-no-cluster-id-z" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR4:[0-9]+]] = { nounwind "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR6:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR7:[0-9]+]] = { cold noreturn nounwind memory(inaccessiblemem: write) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR8]] = { "trap-func-name"="handler" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll
new file mode 100644
index 0000000000000..0010b7ba3e28c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll
@@ -0,0 +1,53 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals all --version 6
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -passes=amdgpu-attributor %s | FileCheck %s
+
+; Trap-like intrinsics such as llvm.trap and llvm.debugtrap do not have the
+; nocallback attribute, so the AMDGPU attributor used to conservatively drop
+; all implicitly-known inputs and AGPR allocation information. Make sure we
+; still infer that no implicit inputs are required and that the AGPR allocation
+; stays at zero.
+
+declare void @llvm.trap()
+
+declare void @llvm.debugtrap()
+
+define amdgpu_kernel void @trap_kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @trap_kernel(
+; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.trap()
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.trap()
+  ret void
+}
+
+define amdgpu_kernel void @trap_kernel_with_handler() {
+; CHECK-LABEL: define amdgpu_kernel void @trap_kernel_with_handler(
+; CHECK-SAME: ) #[[ATTR3:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.trap() #[[ATTR5:[0-9]+]]
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.trap() #0
+  ret void
+}
+
+define amdgpu_kernel void @debugtrap_kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @debugtrap_kernel(
+; CHECK-SAME: ) #[[ATTR4:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.debugtrap()
+; CHECK-NEXT: ...
[truncated]

@arsenm arsenm requested review from jdoerfert and shiltian January 22, 2026 19:58
Comment on lines 150 to 153
// If we already know this intrinsic is nocallback, there is nothing left to
// classify.
if (F.hasFnAttribute(Attribute::NoCallback))
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have been covered by the check in the caller, this function doesn't need to re-check that it's an intrinsic and no callback

Comment on lines 155 to 161
if (!F.hasFnAttribute(Attribute::NoReturn))
return false;

if (F.doesNotAccessMemory())
return true;

return F.onlyAccessesInaccessibleMemory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think these properties are really relevant, it matters if it invokes a call which could use AGPRs

SmallPtrSet<const Function *, 4> TrapHandlers;
for (Instruction &I : instructions(F)) {
auto *CB = dyn_cast<CallBase>(&I);
if (!CB || !CB->hasFnAttr("trap-func-name"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if attributor can / should handle this case in the call graph analysis

// The current assumed state used to determine a change.
auto OrigAssumed = getAssumed();

SmallPtrSet<const Function *, 4> TrapHandlers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't really need to collect a set of these? It's unlikely there will be many


/// Returns true if F looks like a leaf-style trap intrinsic that cannot
/// possibly callback into user code.
static bool isTrapLikeLeafIntrinsic(const Function &F) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the specialness of traps calling a single possible function, it probably does make sense to just special case those. The AMDGPU specific logic is that llvm.trap will never call a function, unless it has the special call attribute. The not-AMDGPU default case could call "abort"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean replace these changes with something simple like this?

case Intrinsic::trap:
case Intrinsic::debugtrap:
case Intrinsic::ubsantrap:
if (CB.hasFnAttr("trap-func-name"))
return CB.hasFnAttr(Attribute::NoCallback);
return true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like that (maybe invert the order, The NoCallback check takes precedent over the trap-func-name)

@arsenm
Copy link
Contributor

arsenm commented Jan 24, 2026

The more I think about it, the more this "trap-func-name" is a busted feature.

It's not in the LangRef, and the symbol isn't required to be defined in the IR. Nothing is going to ensure this function is kept alive during LTO. What is the point of this? If you wanted a call to that specific function, you could have just emitted a call that function? Is anything really using this?

@akadutta
Copy link
Contributor Author

Handling the "trap-func-name" is mostly to cover all bases. The main idea is to prevent early exit from the min AGPR analysis leading to the removal of "amdgpu-agpr-alloc"="0" attributes from llvm.trap. As you had rightly pointed out before, trap instrinsics allow callback into user code through "trap-func-name" and should be handled the same way as any other function/intrinsic without NoCallback.

@arsenm
Copy link
Contributor

arsenm commented Jan 26, 2026

Right, we're just handling this broken thing that as already here. But you should still simplify this into a special case over trap intrinsics without nocallback

return true;
}
}
// if (const Function *CalledFunc = CB.getCalledFunction()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code

case Intrinsic::trap:
case Intrinsic::debugtrap:
case Intrinsic::ubsantrap:
return !CB.hasFnAttr("trap-func-name");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check callsite nocallback too

@arsenm arsenm enabled auto-merge (squash) January 29, 2026 22:09
@akadutta akadutta disabled auto-merge January 29, 2026 22:12
@akadutta akadutta merged commit 7803b4a into llvm:main Jan 30, 2026
10 checks passed
@akadutta akadutta deleted the amdgpu/attributor-no-infer-inputs-missing-nocallback branch January 30, 2026 14:27
@llvm-ci
Copy link

llvm-ci commented Jan 30, 2026

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/21642

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/task-taskwait-nested.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env ARCHER_OPTIONS="ignore_serial=1 report_data_leak=1" env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env 'ARCHER_OPTIONS=ignore_serial=1 report_data_leak=1' env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c:52:11: error: CHECK: expected string not found in input
# | // CHECK: WARNING: ThreadSanitizer: data race
# |           ^
# | <stdin>:1:1: note: scanning from here
# | DONE
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: DONE 
# | check:52     X~~~~ error: no match found
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************


@akadutta akadutta restored the amdgpu/attributor-no-infer-inputs-missing-nocallback branch January 30, 2026 14:40
sshrestha-aa pushed a commit to sshrestha-aa/llvm-project that referenced this pull request Feb 4, 2026
…allback (llvm#175230)

This adds support to whitelist trap intrinsics while handling of
intrinsics with !nocallback. This fixes the reasons behind the previous
revert of llvm#131759.

The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to recognize leaf-like trap intrinsics, so they no longer poison the analysis.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants