[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0) by ckoparkar · Pull Request #161651 · llvm/llvm-project

ckoparkar · 2025-10-02T11:33:01Z

Fixes #161036.

ckoparkar · 2025-10-02T14:02:56Z

/cc @RKSimon

llvmbot · 2025-10-02T14:03:03Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Chaitanya Koparkar (ckoparkar)

Changes

Fixes #161036.

Full diff: https://github.com/llvm/llvm-project/pull/161651.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+22)
(added) llvm/test/CodeGen/X86/underflow-compare-fold.ll (+16)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 558c5a0390228..99d7000c3b62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -6199,6 +6199,28 @@ SDValue DAGCombiner::visitIMINMAX(SDNode *N) {
                                         SDLoc(N), VT, N0, N1))
     return SD;
 
+  // (umin (sub a, b) a) -> (usubo a, b); (select usubo.1, a, usubo.0)
+  //
+  // IR:
+  //   %sub  = sub %a, %b
+  //   %cond = umin %sub, %a
+  //   ->
+  //   %usubo    = usubo %a, %b
+  //   %overflow = extractvalue %usubo, 1
+  //   %sub      = extractvalue %usubo, 0
+  //   %cond     = select %overflow, %a, %sub
+  if (N0.getOpcode() == ISD::SUB) {
+    SDValue A, B, C;
+    if (sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Value(C)))) {
+      EVT AVT = A.getValueType();
+      if (A == C && TLI.isOperationLegalOrCustom(ISD::USUBO, AVT)) {
+        SDVTList VTs = DAG.getVTList(AVT, MVT::i1);
+        SDValue USO = DAG.getNode(ISD::USUBO, DL, VTs, A, B);
+        return DAG.getSelect(DL, VT, USO.getValue(1), A, USO.getValue(0));
+      }
+    }
+  }
+
   // Simplify the operands using demanded-bits information.
   if (SimplifyDemandedBits(SDValue(N, 0)))
     return SDValue(N, 0);
diff --git a/llvm/test/CodeGen/X86/underflow-compare-fold.ll b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
new file mode 100644
index 0000000000000..2416bcb909485
--- /dev/null
+++ b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+; GitHub issue #161036
+
+define i64 @subIfNoUnderflow_umin(i64 %a, i64 %b) {
+; CHECK-LABEL: subIfNoUnderflow_umin
+; CHECK-LABEL: %bb.0
+; CHECK-NEXT: movq    %rdi, %rax
+; CHECK-NEXT: subq    %rsi, %rax
+; CHECK-NEXT: cmovbq  %rdi, %rax
+; CHECK-NEXT: retq
+entry:
+  %sub = sub i64 %a, %b
+  %cond = tail call i64 @llvm.umin.i64(i64 %sub, i64 %a)
+  ret i64 %cond
+}

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/X86/underflow-compare-fold.ll

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

RKSimon

please add aarch64 test coverage

ckoparkar · 2025-10-08T09:56:28Z

@RKSimon this is ready for another review.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/underflow-compare-fold.ll

ckoparkar · 2025-10-16T11:51:29Z

Apologies, I was traveling for a few days. I'll update this today.

RKSimon

Please can you update against trunk latest and investigate the CI failure

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

ckoparkar · 2025-10-31T12:47:59Z

Test failure

Details

Previously the code was:

if (N0.getOpcode() == ISD::SUB) {
  sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Deferred(A)))
  ...

Once I removed the if (N0.getOpcode() == ISD::SUB), this pattern match grabbed this:

  SelectionDAG has 17 nodes:
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
                t8: i32 = sub t2, Constant:i32<4>
              t9: i32 = umin t2, t8


N: t9: i32 = umin t2, t8
A: t2: i32,ch = CopyFromReg t0, Register:i32 %0
B: t7: i32 = Constant<4>

t8: i32 = sub.. is the second argument of umin so I'm kind of confused as to why this pattern match succeeded. I'm now pattern matching on individual arguments of umin.

More tests

Details

I've added some tests with vector types, incorrect patterns, etc. @arsenm also mentioned a test for multiple use.

I didn't guard against multiple uses here since I thought the results of sub and umin are still available after the fold. Am I misunderstanding something here and should I be guarding the pattern with m_OneUse?

* main: (3124 commits) [X86] narrowBitOpRMW - add handling for single bit insertion patterns (llvm#165742) [gn build] Port 5322fb6 [libc++] Simplify the implementation of destroy_at a bit (llvm#165392) [MLIR][NVVM] Update mbarrier.init/inval Ops to use AnyTypeOf[] (llvm#165558) [X86] Remove AMX-TRANSPOSE (llvm#165556) [CIR] Fix multiple returns in switch statements (llvm#164468) [lld][test] Fix file cleanup in aarch64-build-attributes.s (llvm#164396) [X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure there isn't an interdependency between the load and amt (llvm#165850) [llvm][docs] Remove guidance on adding release:reviewed label (llvm#164395) [libc++] Update our documentation on the supported compilers (llvm#165684) [AMDGPU][GlobalISel] Add register bank legalization for G_FADD (llvm#163407) [LLVM][ConstantFolding] Extend constantFoldVectorReduce to include scalable vectors. (llvm#165437) [SDAG] Set InBounds when when computing offsets into memory objects (llvm#165425) [llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (llvm#165409) [NFCI] Address post-merge review of llvm#162503 (llvm#165582) [clang][tools][scan-view] Remove Python2 compatibility code in ScanView.py (llvm#163747) [lldb][TypeSystem] Remove count parameter from TypeSystem::IsFloatingPointType (llvm#165707) [llvm][tools][opt-viewer] Put back missing function [llvm][tools][opt-viewer] Remove Python2 compatability code in optrecord.py (llvm#163744) [clang][utils] Make CmpDriver Python3 compatible (llvm#163740) ...

* main: [SPIRV] Fix vector bitcast check in LegalizePointerCast (llvm#164997) [lldb][docs] Add troubleshooting section to scripting introduction [Sema] Fix parameter index checks on explicit object member functions (llvm#165586) To fix polymorphic pointer assignment in FORALL when LHS is unlimited polymorphic and RHS is intrinsic type target (llvm#164999) [CostModel][AArch64] Model cost of extract.last.active intrinsic (clastb) (llvm#165739) [MemProf] Select largest of matching contexts from profile (llvm#165338) [lldb][TypeSystem] Better support for _BitInt types (llvm#165689) [NVPTX] Move TMA G2S lowering to Tablegen (llvm#165710) [MLIR][NVVM] Extend NVVM mma ops to support fp64 (llvm#165380) [UTC] Support to test annotated IR (llvm#165419)

* main: (1028 commits) [clang][DebugInfo] Attach `DISubprogram` to additional call variants (llvm#166202) [C2y] Claim nonconformance to WG14 N3348 (llvm#166966) [X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm#167307) Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (llvm#167232) [DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (llvm#166855) [VPlan] Don't apply predication discount to non-originally-predicated blocks (llvm#160449) [libc++] Avoid overloaded `operator,` for (`T`, `Iter`) cases (llvm#161049) [tools][llc] Make save-stats.ll test target independent (llvm#167238) [AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset (llvm#166756) [X86] ldexp-avx512.ll - add v8f16/v16f16/v32f16 test coverage for llvm#165694 (llvm#167294) [DropAssumes] Drop dereferenceable assumptions after vectorization. (llvm#166947) [VPlan] Simplify branch-cond with getVectorTripCount (llvm#155604) Remove unused <algorithm> inclusion (llvm#166942) [AArch64] Combine subtract with borrow to SBC. (llvm#165271) [AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts. (llvm#165863) [SPIRV] Fix failing assertion in SPIRVAsmPrinter (llvm#166909) [libc++] Merge insert/emplace(const_iterator, Args...) implementations (llvm#166470) [libc++] Replace __libcpp_is_final with a variable template (llvm#167137) [gn build] Port 152bda7 [libc++] Replace the last uses of __tuple_types with __type_list (llvm#167214) ...

llvm/test/CodeGen/AMDGPU/umin-sub-to-usubo-select-combine.ll

ckoparkar · 2025-11-10T15:04:36Z

To fix the test failure I need to update the patterns in CodeGen/AMDGPU/llvm.set.rounding.ll since the fold also applies to it. The DAG for set_rounding:

Legalized selection DAG: %bb.0 's_set_rounding:'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
                t8: i32 = sub t2, Constant:i32<4>
              t9: i32 = umin t2, t8
            t11: i32 = shl t9, Constant:i32<2>
          t12: i64 = srl Constant:i64<-5242644231586798321>, t11
        t13: i32 = truncate t12
      t15: i32 = llvm.amdgcn.readfirstlane TargetConstant:i32<3375>, t13
    t18: ch = llvm.amdgcn.s.setreg t0, TargetConstant:i32<3416>, TargetConstant:i32<6145>, t15
  t5: ch = RET_GLUE t18

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

* main: (63 commits) [libc++] Inline vector::__append into resize (llvm#162086) [Flang][OpenMP] Move char box bounds generation for Maps to DirectiveCommons.h (llvm#165918) RuntimeLibcalls: Add entries for vector sincospi functions (llvm#166981) [X86] _mm_addsub_pd is not valid for constexpr (llvm#167363) [CIR] Re-land: Recognize constant aggregate initialization of auto vars (llvm#167033) [gn build] Port d2521f1 [gn build] Port caed089 [gn build] Port 315d705 [gn build] Port 2345b7d [PowerPC] convert memmove to milicode call .___memmove64[PR] in 64-bit mode (llvm#167334) [HLSL] Add internal linkage attribute to resources (llvm#166844) AMDGPU: Update test after e95f6fa [bazel] Port llvm#166980: TLI/VectorLibrary refactor (llvm#167354) [libc++] Split macros related to hardening into their own header (llvm#167069) [libc++][NFC] Remove unused imports from generate_feature_test_macro_components.py (llvm#159591) [CIR][NFC] Add test for Complex imag with GUN extension (llvm#167215) [BOLT][AArch64] Add more heuristics on epilogue determination (llvm#167077) RegisterCoalescer: Enable terminal rule by default for AMDGPU (llvm#161621) Revert "[clang] Refactor option-related code from clangDriver into new clangOptions library" (llvm#167348) [libc++][docs] Update to refer to P3355R2 (llvm#167267) ...

RKSimon

LGTM

….0) (llvm#161651) Fixes llvm#161036.

mstorsjo · 2025-11-13T09:36:09Z

This change broke a significant number of cases in ffmpeg, across multiple architectures.

One way of reproducing is this:

$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang
$ make -j$(nproc)
$ make fate-vsynth1-flv

This produces a binary that crashes for this testcase.

On x86_64, the object file where the breakage is, is in libavcodec/x86/videodsp_init.o. On aarch64, the fault appears in libavcodec/videodsp.o.

Compilation of that particular object file (the one on x86) can be done standalone with https://martin.st/temp/videodsp_init-preproc.c - with clang -target x86_64-linux-gnu -c -O2 videodsp_init-preproc.c.

If comparing the difference in generated code caused by this patch, I get this:

--- out-good.s  2025-11-13 11:30:40.037589941 +0200
+++ out-bad.s   2025-11-13 11:30:33.665583286 +0200
@@ -132,15 +132,13 @@
        addq    %r10, %r11
        movq    %rax, %rbx
 .LBB1_12:                               # %if.end29.i
+       xorl    %eax, %eax
        subq    %r8, %r9
        negq    %r8
-       xorl    %eax, %eax
-       testq   %r8, %r8
-       cmovleq %rax, %r8
+       cmovbq  %rax, %r8
        movq    %rbx, %r13
        negq    %r13
-       testq   %r13, %r13
-       cmovleq %rax, %r13
+       cmovbq  %rax, %r13
        cmpq    %r9, %r14
        cmovlq  %r14, %r9
        subq    %rbx, %rbp
@@ -374,15 +372,13 @@
        addq    %r10, %r11
        movq    %rax, %rbx
 .LBB2_12:                               # %if.end29.i
+       xorl    %eax, %eax
        subq    %r8, %r9
        negq    %r8
-       xorl    %eax, %eax
-       testq   %r8, %r8
-       cmovleq %rax, %r8
+       cmovbq  %rax, %r8
        movq    %rbx, %r13
        negq    %r13
-       testq   %r13, %r13
-       cmovleq %rax, %r13
+       cmovbq  %rax, %r13
        cmpq    %r9, %r14
        cmovlq  %r14, %r9
        subq    %rbx, %rbp

I would suggest reverting this commit for now.

… a usubo.0)" (#167854) Reverts #161651 due to downstream bad codegen reports

…ect usubo.1 a usubo.0)" (#167854) Reverts llvm/llvm-project#161651 due to downstream bad codegen reports

Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures

Baseline tests from #161651 that were reverted in #167854 Still missing test coverage for the ffmpeg regression failures

Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures

ckoparkar added 4 commits October 2, 2025 07:29

[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)

3a24f85

Remove smin pattern, it might not be correct

0db4bf0

Merge branch 'main' into ckoparkar/161036

1848dd5

Fix check for retq

c0ba3fd

ckoparkar marked this pull request as ready for review October 2, 2025 14:02

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Oct 2, 2025

Merge branch 'main' into ckoparkar/161036

5aa0f85

RKSimon requested changes Oct 2, 2025

View reviewed changes

Address review comments

57e2ea0

llvmbot added the backend:AArch64 label Oct 3, 2025

ckoparkar requested a review from RKSimon October 3, 2025 12:11

arsenm reviewed Oct 10, 2025

View reviewed changes

Don't use MVT::i1

7e08f23

ckoparkar marked this pull request as draft October 22, 2025 12:39

RKSimon requested changes Oct 30, 2025

View reviewed changes

ckoparkar added 2 commits October 31, 2025 07:50

Fix pattern match

7660fae

Add some more tests

d795383

RKSimon reviewed Oct 31, 2025

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Outdated Show resolved Hide resolved

ckoparkar added 3 commits October 31, 2025 08:55

Simplify pattern match

a27dcf5

ckoparkar marked this pull request as ready for review October 31, 2025 14:28

ckoparkar requested a review from RKSimon October 31, 2025 14:28

ckoparkar removed request for MaheshRavishankar, bcardosolopes, lanza, nicolasvasilache, nikic and xlauko November 6, 2025 13:38

ckoparkar added 3 commits November 6, 2025 08:41

Edits

e1ba30b

Adjust patterns for AMDGPU

6df10cc

ckoparkar commented Nov 10, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/umin-sub-to-usubo-select-combine.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/umin-sub-to-usubo-select-combine.ll Outdated Show resolved Hide resolved

ckoparkar requested a review from arsenm November 10, 2025 13:27

Adjust patterns in AMDGPU/llvm.set.rounding.ll

0a54b44

arsenm reviewed Nov 10, 2025

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Outdated Show resolved Hide resolved

Apply arsenm's suggestion

b3579ab

arsenm reviewed Nov 10, 2025

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Show resolved Hide resolved

arsenm approved these changes Nov 10, 2025

View reviewed changes

RKSimon approved these changes Nov 12, 2025

View reviewed changes

RKSimon merged commit 1f58cbe into llvm:main Nov 12, 2025
10 checks passed

arsenm mentioned this pull request Nov 12, 2025

AMDGPU should report 64-bit add/sub with overflow as legal #167747

Open

git-crd pushed a commit to git-crd/crd-llvm-project that referenced this pull request Nov 13, 2025

[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo…

aaa3365

….0) (llvm#161651) Fixes llvm#161036.

RKSimon mentioned this pull request Nov 13, 2025

Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)" #167854

Merged

RKSimon added a commit that referenced this pull request Nov 13, 2025

Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1…

a5342d5

… a usubo.0)" (#167854) Reverts #161651 due to downstream bad codegen reports

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 13, 2025

Automerge: Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (sel…

18e2a07

…ect usubo.1 a usubo.0)" (#167854) Reverts llvm/llvm-project#161651 due to downstream bad codegen reports

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Nov 16, 2025

[DAG] Add baseline test coverage for llvm#161036

db89135

Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures

RKSimon mentioned this pull request Nov 16, 2025

[DAG] Add baseline test coverage for #161036 #168278

Merged

RKSimon added a commit that referenced this pull request Nov 16, 2025

[DAG] Add baseline test coverage for #161036 (#168278)

8a055f8

Baseline tests from #161651 that were reverted in #167854 Still missing test coverage for the ffmpeg regression failures

Comments

Conversation

ckoparkar commented Oct 2, 2025

Uh oh!

ckoparkar commented Oct 2, 2025

Uh oh!

llvmbot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

ckoparkar commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ckoparkar commented Oct 16, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ckoparkar commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

ckoparkar commented Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mstorsjo commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Oct 2, 2025 •

edited

Loading