Skip to content

[AMDGPU][SIInsertWaitcnts] Fix iota_range assertion when OtherMarks is empty in mergeAsyncMarks()#193499

Merged
arsenm merged 10 commits into
llvm:mainfrom
xiaohuguo2023:fix/si-insert-waitcnts-87717bf9
May 1, 2026
Merged

[AMDGPU][SIInsertWaitcnts] Fix iota_range assertion when OtherMarks is empty in mergeAsyncMarks()#193499
arsenm merged 10 commits into
llvm:mainfrom
xiaohuguo2023:fix/si-insert-waitcnts-87717bf9

Conversation

@xiaohuguo2023
Copy link
Copy Markdown
Contributor

WaitcntBrackets::mergeAsyncMarks() asserts when merging async wait-count
state at a CFG join point where one predecessor has pending async memory
operations and the other does not.

Problem:

  • The existing early-exit only handles the both-empty case
  • When OtherMarks is empty but AsyncMarks is not, MergeCount = min(0, N) = 0
  • seq_inclusive(1, 0) fires: "Assertion Begin <= End failed"

Changes:

  • Add early return when MergeCount == 0 (OtherMarks is empty)
  • When the other predecessor contributed no async marks, our marks are
    unchanged and no stricter waits are needed
  • Add regression lit test: asyncmark-merge-empty-other.mir

@github-actions
Copy link
Copy Markdown

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 22, 2026

@llvm/pr-subscribers-backend-amdgpu

Author: xiaohuguo2023

Changes

WaitcntBrackets::mergeAsyncMarks() asserts when merging async wait-count
state at a CFG join point where one predecessor has pending async memory
operations and the other does not.

Problem:

  • The existing early-exit only handles the both-empty case
  • When OtherMarks is empty but AsyncMarks is not, MergeCount = min(0, N) = 0
  • seq_inclusive<unsigned>(1, 0) fires: "Assertion Begin <= End failed"

Changes:

  • Add early return when MergeCount == 0 (OtherMarks is empty)
  • When the other predecessor contributed no async marks, our marks are
    unchanged and no stricter waits are needed
  • Add regression lit test: asyncmark-merge-empty-other.mir

Full diff: https://github.com/llvm/llvm-project/pull/193499.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+2)
  • (added) llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir (+75)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index abd03d5f9bf73..120faaa82a315 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -3024,6 +3024,8 @@ bool WaitcntBrackets::mergeAsyncMarks(ArrayRef<MergeInfo> MergeInfos,
   unsigned OtherSize = OtherMarks.size();
   unsigned OurSize = AsyncMarks.size();
   unsigned MergeCount = std::min(OtherSize, OurSize);
+  if (MergeCount == 0)
+    return StrictDom;
   for (auto Idx : seq_inclusive<unsigned>(1, MergeCount)) {
     for (auto T : inst_counter_types(Context->MaxCounter)) {
       StrictDom |= mergeScore(MergeInfos[T], AsyncMarks[OurSize - Idx][T],
diff --git a/llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir b/llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir
new file mode 100644
index 0000000000000..ec9c33f33a1f8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir
@@ -0,0 +1,75 @@
+# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass=si-insert-waitcnts -o - %s | FileCheck %s
+
+# Regression test for mergeAsyncMarks() asserting when OtherMarks is empty.
+#
+# At a CFG join point where one predecessor has an ASYNCMARK (non-empty
+# AsyncMarks) and the other does not (empty OtherMarks), MergeCount becomes
+# min(0, N) = 0. Before the fix, seq_inclusive<unsigned>(1, 0) asserted
+# Begin <= End. After the fix the function returns early: the other predecessor
+# contributed no async marks so our marks are unchanged and no stricter waits
+# are needed.
+#
+# Key check: no S_WAITCNT is inserted at the join block (bb.3). The else
+# branch had no async operations so merging its empty marks into the then
+# branch's non-empty marks requires no additional waits.
+
+--- |
+  define void @asyncmark_merge_empty_other(i32 %cond) {
+  entry:
+    br i1 true, label %then, label %else
+  then:
+    br label %join
+  else:
+    br label %join
+  join:
+    ret void
+  }
+...
+
+---
+name:            asyncmark_merge_empty_other
+tracksRegLiveness: true
+machineFunctionInfo:
+  occupancy:       8
+body:             |
+  ; entry — conditional branch to then (async) or else (no async)
+  bb.0.entry:
+    successors: %bb.1(0x40000000), %bb.2(0x40000000)
+    liveins: $sgpr0, $sgpr1
+
+    ; CHECK-LABEL: name: asyncmark_merge_empty_other
+    ; CHECK:      bb.0.entry:
+    ; CHECK:        S_CMP_LG_U32 $sgpr0, $sgpr1, implicit-def $scc
+    ; CHECK-NEXT:   S_CBRANCH_SCC1 %bb.2, implicit killed $scc
+    S_CMP_LG_U32 $sgpr0, $sgpr1, implicit-def $scc
+    S_CBRANCH_SCC1 %bb.2, implicit killed $scc
+
+  ; then branch — ASYNCMARK records non-empty AsyncMarks
+  bb.1.then:
+    successors: %bb.3(0x80000000)
+
+    ; CHECK:      bb.1.then:
+    ; CHECK:        ASYNCMARK
+    ; CHECK-NEXT:   S_BRANCH %bb.3
+    ASYNCMARK
+    S_BRANCH %bb.3
+
+  ; else branch — no async operations, OtherMarks is empty at join
+  bb.2.else:
+    successors: %bb.3(0x80000000)
+
+    ; CHECK:      bb.2.else:
+    ; CHECK:        S_BRANCH %bb.3
+    S_BRANCH %bb.3
+
+  ; join — mergeAsyncMarks() sees non-empty AsyncMarks (bb.1) and empty
+  ; OtherMarks (bb.2). Before the fix: assertion. After the fix: early
+  ; return, no waitcnt inserted.
+  bb.3.join:
+    ; CHECK:      bb.3.join:
+    ; CHECK-NOT:    S_WAITCNT
+    ; CHECK:        S_ENDPGM 0
+    ; Verify no waitcnt is inserted before S_ENDPGM — the else predecessor
+    ; had no async marks so no stricter waits are needed at the join.
+    S_ENDPGM 0
+...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-NOT checks are fragile, use update_mir_test_checks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -run-pass=si-insert-waitcnts -o - %s | FileCheck %s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make sense to place this at the top of the function, where there already is the AsyncMarks.empty() && OtherMarks.empty() check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — I've moved the guard to the top of the function and changed && to || to cover the one-side-empty case, which also lets us drop the mid-function MergeCount == 0 check. Thanks for the suggestion!

@xiaohuguo2023 xiaohuguo2023 requested review from arsenm and ro-i April 26, 2026 20:53
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't entirely accurate. No wait is generated because there is no wait.asyncmark instruction. The should have a wait.asyncmark in the join block, and it should emit an S_WAIT wait without crashing. For completeness, the test should also exercise the opposite pattern where the single asyncmark is in the else successor instead of the then successor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch, I've rewritten it with two functions (asyncmark_in_then / asyncmark_in_else) that use a real BUFFER_LOAD_DWORD_OFFSET + ASYNCMARK on the async path and a consumer V_ADD_U32_e32 in the join block, so the generated S_WAITCNT can now be verified.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordinary BUFFER_LOAD_DWORD_OFFSET does not use ASYNCMARK at all. On gfx900, you need BUFFER_LOAD_DWORD_LDS_OFFEN with the async bit set in its aux/cpol argument. Or on gfx1250, you need GLOBAL_LOAD_ASYNC_TO_LDS. Both are not strictly necessary, but I would suggest using only gfx1250.

The test is still missing a WAIT_ASYNCMARK in the join block. Right now, the wait is being generated because the buffer load is in fact not async at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssahasra
Many Thanks for the detailed feedback. You're right that BUFFER_LOAD_DWORD_OFFSET is synchronous and the S_WAITCNT in the join was just ordinary vmcnt tracking, not async mark machinery.

  • I've replaced it with BUFFER_LOAD_DWORD_LDS_IDXEN (IsAsync=1) and added WAIT_ASYNCMARK 0 to the join block.

  • For the target: gfx1250 isn't currently feasible ?

    • ASYNCMARK hits assert(ST.getGeneration() < AMDGPUSubtarget::GFX12) (SIInsertWaitcnts.cpp:3273) and
    • WAIT_ASYNCMARK hits reportFatalUsageError("WAIT_ASYNCMARK is not ready for GFX12 yet") (line 2096).
    • I've used gfx950 instead since that's the hardware where the original bug was found.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the target: gfx1250 isn't currently feasible ?

Is it possible that you are using a stale copy? Support for GFX1250 was added in #185813.

I've used gfx950 instead since that's the hardware where the original bug was found.

GFX9 uses VM_CNT to track async buffer loads. GFX1250 supports new intrinsics like load.async.to.lds, and these are tracked using ASYNC_CNT instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssahasra You were right — my copy was stale. After rebasing onto upstream main I updated the test to gfx1250 with GLOBAL_LOAD_ASYNC_TO_LDS_B32 and WAIT_ASYNCMARK 0 in the join block. On the else-pattern the pass correctly expands WAIT_ASYNCMARK 0 to S_WAIT_ASYNCCNT 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need IR section

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

In mergeAsyncMarks(), when one branch of a conditional (e.g. EVEN_K=False
K-loop boundary) has no async memory ops, OtherMarks can be empty while
AsyncMarks is not. The existing early-exit only handles both-empty, so
MergeCount = min(0, N) = 0 reaches seq_inclusive<unsigned>(1, 0) which
asserts Begin <= End.

Fix: return early when MergeCount == 0 — nothing to merge.

Reproducer: compile a StreamK GEMM kernel on gfx950 (MI350X) with
  BLOCK_SIZE_K=256, K=2880 (not divisible → EVEN_K=False),
  num_stages=2, waves_per_eu=4, matrix_instr_nonkdim=32.
The assert fires deterministically even when compiled alone (not a race).

Fixes: iota_range<unsigned>(1, 0) → Assertion Begin <= End failed
  - Regenerate CHECK lines using update_mir_test_checks.py
  - Remove -verify-machineinstrs from RUN line
  Change `&&` to `||` in the early-exit check so the function returns
  immediately when either AsyncMarks or OtherMarks is empty, rather than
  falling through to the resize/pad logic and hitting a seq_inclusive
  assert (Begin > End) when MergeCount == 0. Removes the now-unreachable
  mid-function MergeCount == 0 guard.
  Test two CFG patterns where one predecessor carries an ASYNCMARK from a
  BUFFER_LOAD and the other is a sync path (empty OtherMarks):

    asyncmark_in_then: ASYNCMARK in then-successor, else is sync
    asyncmark_in_else: ASYNCMARK in else-successor, then is sync

  Before the fix, mergeAsyncMarks() asserted Begin <= End via
  seq_inclusive<unsigned>(1, 0). After the fix the function returns
  early when either side is empty and preserves the pending async marks,
  so S_WAITCNT is correctly emitted before the consumer in the join block
  - Replaced BUFFER_LOAD_DWORD_OFFSET with BUFFER_LOAD_DWORD_LDS_IDXEN + IsAsync=1 — a real async LDS DMA that isAsync() recognises, recording AsyncScore[LOAD_CNT] before ASYNCMARK pushes it onto AsyncMarks
  - Added $m0 = S_MOV_B32 0 in bb.0 (required for LDS DMA)
  - Added WAIT_ASYNCMARK 0 to the join block so the async mark machinery is actually consumed
  - Changed join block consumer to DS_READ_B32_gfx9 (reads from LDS where the async DMA wrote)
  - Regenerated CHECK lines
  Update MIR test to use gfx1250 async LDS DMA

  Switch from gfx950 + BUFFER_LOAD_DWORD_LDS_IDXEN to gfx1250 +
  GLOBAL_LOAD_ASYNC_TO_LDS_B32 as suggested in review.

  On GFX1250 the async load is tracked via ASYNC_CNT rather than VM_CNT,
  which properly exercises the ASYNCMARK/WAIT_ASYNCMARK machinery. The
  join block carries WAIT_ASYNCMARK 0; in asyncmark_in_else the pass
  expands it to S_WAIT_ASYNCCNT 0 to drain the pending async count.

  Rebased onto upstream main to pick up GFX1250 ASYNCMARK support
  added in llvm#185813.

  Regenerate CHECK lines with update_mir_test_checks.py.
@xiaohuguo2023 xiaohuguo2023 force-pushed the fix/si-insert-waitcnts-87717bf9 branch from 4957222 to 5d9b8e4 Compare April 30, 2026 21:05
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
successors: %bb.3(0x80000000)
liveins: $vgpr0_vgpr1, $vgpr2

GLOBAL_LOAD_ASYNC_TO_LDS_B32 $vgpr2, $vgpr0_vgpr1, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load (s32) from `ptr addrspace(1) poison`), (store (s32) into `ptr addrspace(3) poison`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GLOBAL_LOAD_ASYNC_TO_LDS_B32 $vgpr2, $vgpr0_vgpr1, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load (s32) from `ptr addrspace(1) poison`), (store (s32) into `ptr addrspace(3) poison`)
GLOBAL_LOAD_ASYNC_TO_LDS_B32 $vgpr2, $vgpr0_vgpr1, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load (s32), addrspace 1), (store (s32), addrspace 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, missed it. done now

Comment thread llvm/test/CodeGen/AMDGPU/asyncmark-merge-empty-other.mir Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

🪟 Windows x64 Test Results

  • 134132 tests passed
  • 3190 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

🐧 Linux x64 Test Results

  • 194639 tests passed
  • 5166 tests skipped

✅ The build succeeded and all tests passed.

  To consolidate the one-side-empty guard at the top of the function,
  the early-exit was changed from && to ||. However, after rebasing onto
  upstream/main, the || condition broke asyncmark-max-pregfx12.ll: the
  padding logic now sits between the early-exit and the seq_inclusive loop,
  so returning early when AsyncMarks.empty() && OtherMarks non-empty skips
  the padding and silently drops OtherMarks.

  The correct fix keeps && at the top (handles the both-empty case, leaves
  padding intact for the AsyncMarks-empty case) and adds:

    if (MergeCount == 0)
      return StrictDom;

  after the padding to handle the OtherMarks-empty case without touching
  the padding logic or triggering seq_inclusive<unsigned>(1, 0).

  Update asyncmark-merge-empty-other.mir: with OtherMarks no longer
  dropped, asyncmark_in_else correctly emits S_WAIT_ASYNCCNT 0 at the
  join block.
@xiaohuguo2023
Copy link
Copy Markdown
Contributor Author

🐧 Linux x64 Test Results

  • 173532 tests passed
  • 3278 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.CodeGen/AMDGPU/asyncmark-max-pregfx12.ll

Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900  < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll:153:15: error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: s_waitcnt vmcnt(15)
# |               ^
# | <stdin>:299:22: note: scanning from here
# |  ; wait_asyncmark(17)
# |                      ^
# | <stdin>:303:2: note: possible intended match here
# |  s_waitcnt vmcnt(0)
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           294:  ; asyncmark 
# |           295:  global_load_dword v[0:1], off lds 
# |           296:  ; asyncmark 
# |           297: .LBB2_4: ; %endif 
# |           298:  s_or_b64 exec, exec, s[4:5] 
# |           299:  ; wait_asyncmark(17) 
# | next:153'0                          X error: no match found
# |           300:  ds_read_b32 v0, v2 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~
# |           301:  s_waitcnt lgkmcnt(0) 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~~~
# |           302:  global_store_dword v[4:5], v0, off 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           303:  s_waitcnt vmcnt(0) 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~
# | next:153'1      ?                   possible intended match
# |           304:  s_setpc_b64 s[30:31] 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~~~
# |           305: .Lfunc_end2: 
# | next:153'0     ~~~~~~~~~~~~~
# |           306:  .size max_when_merged, .Lfunc_end2-max_when_merged 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           307:  ; -- End function 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~
# |           308:  .set .Lmax_when_merged.num_vgpr, 6 
# | next:153'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

this should be fixed now

@@ -3067,6 +3067,11 @@ bool WaitcntBrackets::mergeAsyncMarks(ArrayRef<MergeInfo> MergeInfos,
unsigned OtherSize = OtherMarks.size();
unsigned OurSize = AsyncMarks.size();
unsigned MergeCount = std::min(OtherSize, OurSize);
// OtherMarks is empty → OtherSize == 0 → MergeCount == 0.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not ASICII. I'm not sure what our policy is, but I'd use -> instead.

@@ -0,0 +1,139 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-insert-waitcnts -o - %s | FileCheck %s
Copy link
Copy Markdown
Contributor

@shiltian shiltian May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-insert-waitcnts -o - %s | FileCheck %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -run-pass=si-insert-waitcnts < %s | FileCheck %s

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use < %s, that will think the input is IR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< %s fails for MIR files (llc can't detect format from stdin), so kept -o - %s but applied the triple fix amdgcn → amdgcn-amd-amdhsa

@xiaohuguo2023 xiaohuguo2023 requested review from arsenm and shiltian May 1, 2026 17:57
@arsenm arsenm enabled auto-merge (squash) May 1, 2026 18:15
@arsenm arsenm merged commit 81d618b into llvm:main May 1, 2026
9 of 10 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

@xiaohuguo2023 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

antiagainst pushed a commit to triton-lang/triton that referenced this pull request May 2, 2026
enferex pushed a commit to enferex/llvm-project that referenced this pull request May 5, 2026
…s empty in mergeAsyncMarks() (llvm#193499)

WaitcntBrackets::mergeAsyncMarks() asserts when merging async wait-count
state at a CFG join point where one predecessor has pending async memory
  operations and the other does not.

  Problem:
  - The existing early-exit only handles the both-empty case
- When OtherMarks is empty but AsyncMarks is not, MergeCount = min(0, N)
= 0
  - seq_inclusive<unsigned>(1, 0) fires: "Assertion Begin <= End failed"

  Changes:
  - Add early return when MergeCount == 0 (OtherMarks is empty)
  - When the other predecessor contributed no async marks, our marks are
    unchanged and no stricter waits are needed
  - Add regression lit test: asyncmark-merge-empty-other.mir
moar55 pushed a commit to moar55/llvm-project that referenced this pull request May 12, 2026
…s empty in mergeAsyncMarks() (llvm#193499)

WaitcntBrackets::mergeAsyncMarks() asserts when merging async wait-count
state at a CFG join point where one predecessor has pending async memory
  operations and the other does not.

  Problem:
  - The existing early-exit only handles the both-empty case
- When OtherMarks is empty but AsyncMarks is not, MergeCount = min(0, N)
= 0
  - seq_inclusive<unsigned>(1, 0) fires: "Assertion Begin <= End failed"

  Changes:
  - Add early return when MergeCount == 0 (OtherMarks is empty)
  - When the other predecessor contributed no async marks, our marks are
    unchanged and no stricter waits are needed
  - Add regression lit test: asyncmark-merge-empty-other.mir
antiagainst pushed a commit to triton-lang/triton that referenced this pull request May 28, 2026
This reverts commit 36394d4. We saw
some compile failure cases with disabled vector combine. But should be
resolved and should re-land this commit once with
llvm/llvm-project#193499 or commit
`81d618b6bc1e71cda79fe7bf9cbab63933dd5975` gets picked up in the next
LLVM bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants