[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)#161651
[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)#161651
Conversation
|
/cc @RKSimon |
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-backend-x86 Author: Chaitanya Koparkar (ckoparkar) ChangesFixes #161036. Full diff: https://github.com/llvm/llvm-project/pull/161651.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 558c5a0390228..99d7000c3b62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -6199,6 +6199,28 @@ SDValue DAGCombiner::visitIMINMAX(SDNode *N) {
SDLoc(N), VT, N0, N1))
return SD;
+ // (umin (sub a, b) a) -> (usubo a, b); (select usubo.1, a, usubo.0)
+ //
+ // IR:
+ // %sub = sub %a, %b
+ // %cond = umin %sub, %a
+ // ->
+ // %usubo = usubo %a, %b
+ // %overflow = extractvalue %usubo, 1
+ // %sub = extractvalue %usubo, 0
+ // %cond = select %overflow, %a, %sub
+ if (N0.getOpcode() == ISD::SUB) {
+ SDValue A, B, C;
+ if (sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Value(C)))) {
+ EVT AVT = A.getValueType();
+ if (A == C && TLI.isOperationLegalOrCustom(ISD::USUBO, AVT)) {
+ SDVTList VTs = DAG.getVTList(AVT, MVT::i1);
+ SDValue USO = DAG.getNode(ISD::USUBO, DL, VTs, A, B);
+ return DAG.getSelect(DL, VT, USO.getValue(1), A, USO.getValue(0));
+ }
+ }
+ }
+
// Simplify the operands using demanded-bits information.
if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);
diff --git a/llvm/test/CodeGen/X86/underflow-compare-fold.ll b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
new file mode 100644
index 0000000000000..2416bcb909485
--- /dev/null
+++ b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+; GitHub issue #161036
+
+define i64 @subIfNoUnderflow_umin(i64 %a, i64 %b) {
+; CHECK-LABEL: subIfNoUnderflow_umin
+; CHECK-LABEL: %bb.0
+; CHECK-NEXT: movq %rdi, %rax
+; CHECK-NEXT: subq %rsi, %rax
+; CHECK-NEXT: cmovbq %rdi, %rax
+; CHECK-NEXT: retq
+entry:
+ %sub = sub i64 %a, %b
+ %cond = tail call i64 @llvm.umin.i64(i64 %sub, i64 %a)
+ ret i64 %cond
+}
|
RKSimon
left a comment
There was a problem hiding this comment.
please add aarch64 test coverage
|
@RKSimon this is ready for another review. |
|
Apologies, I was traveling for a few days. I'll update this today. |
RKSimon
left a comment
There was a problem hiding this comment.
Please can you update against trunk latest and investigate the CI failure
DetailsPreviously the code was: Once I removed the
DetailsI've added some tests with vector types, incorrect patterns, etc. @arsenm also mentioned a test for multiple use. I didn't guard against multiple uses here since I thought the results of |
* main: (3124 commits) [X86] narrowBitOpRMW - add handling for single bit insertion patterns (llvm#165742) [gn build] Port 5322fb6 [libc++] Simplify the implementation of destroy_at a bit (llvm#165392) [MLIR][NVVM] Update mbarrier.init/inval Ops to use AnyTypeOf[] (llvm#165558) [X86] Remove AMX-TRANSPOSE (llvm#165556) [CIR] Fix multiple returns in switch statements (llvm#164468) [lld][test] Fix file cleanup in aarch64-build-attributes.s (llvm#164396) [X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure there isn't an interdependency between the load and amt (llvm#165850) [llvm][docs] Remove guidance on adding release:reviewed label (llvm#164395) [libc++] Update our documentation on the supported compilers (llvm#165684) [AMDGPU][GlobalISel] Add register bank legalization for G_FADD (llvm#163407) [LLVM][ConstantFolding] Extend constantFoldVectorReduce to include scalable vectors. (llvm#165437) [SDAG] Set InBounds when when computing offsets into memory objects (llvm#165425) [llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (llvm#165409) [NFCI] Address post-merge review of llvm#162503 (llvm#165582) [clang][tools][scan-view] Remove Python2 compatibility code in ScanView.py (llvm#163747) [lldb][TypeSystem] Remove count parameter from TypeSystem::IsFloatingPointType (llvm#165707) [llvm][tools][opt-viewer] Put back missing function [llvm][tools][opt-viewer] Remove Python2 compatability code in optrecord.py (llvm#163744) [clang][utils] Make CmpDriver Python3 compatible (llvm#163740) ...
* main: [SPIRV] Fix vector bitcast check in LegalizePointerCast (llvm#164997) [lldb][docs] Add troubleshooting section to scripting introduction [Sema] Fix parameter index checks on explicit object member functions (llvm#165586) To fix polymorphic pointer assignment in FORALL when LHS is unlimited polymorphic and RHS is intrinsic type target (llvm#164999) [CostModel][AArch64] Model cost of extract.last.active intrinsic (clastb) (llvm#165739) [MemProf] Select largest of matching contexts from profile (llvm#165338) [lldb][TypeSystem] Better support for _BitInt types (llvm#165689) [NVPTX] Move TMA G2S lowering to Tablegen (llvm#165710) [MLIR][NVVM] Extend NVVM mma ops to support fp64 (llvm#165380) [UTC] Support to test annotated IR (llvm#165419)
* main: (1028 commits) [clang][DebugInfo] Attach `DISubprogram` to additional call variants (llvm#166202) [C2y] Claim nonconformance to WG14 N3348 (llvm#166966) [X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm#167307) Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (llvm#167232) [DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (llvm#166855) [VPlan] Don't apply predication discount to non-originally-predicated blocks (llvm#160449) [libc++] Avoid overloaded `operator,` for (`T`, `Iter`) cases (llvm#161049) [tools][llc] Make save-stats.ll test target independent (llvm#167238) [AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset (llvm#166756) [X86] ldexp-avx512.ll - add v8f16/v16f16/v32f16 test coverage for llvm#165694 (llvm#167294) [DropAssumes] Drop dereferenceable assumptions after vectorization. (llvm#166947) [VPlan] Simplify branch-cond with getVectorTripCount (llvm#155604) Remove unused <algorithm> inclusion (llvm#166942) [AArch64] Combine subtract with borrow to SBC. (llvm#165271) [AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts. (llvm#165863) [SPIRV] Fix failing assertion in SPIRVAsmPrinter (llvm#166909) [libc++] Merge insert/emplace(const_iterator, Args...) implementations (llvm#166470) [libc++] Replace __libcpp_is_final with a variable template (llvm#167137) [gn build] Port 152bda7 [libc++] Replace the last uses of __tuple_types with __type_list (llvm#167214) ...
|
To fix the test failure I need to update the patterns in |
* main: (63 commits) [libc++] Inline vector::__append into resize (llvm#162086) [Flang][OpenMP] Move char box bounds generation for Maps to DirectiveCommons.h (llvm#165918) RuntimeLibcalls: Add entries for vector sincospi functions (llvm#166981) [X86] _mm_addsub_pd is not valid for constexpr (llvm#167363) [CIR] Re-land: Recognize constant aggregate initialization of auto vars (llvm#167033) [gn build] Port d2521f1 [gn build] Port caed089 [gn build] Port 315d705 [gn build] Port 2345b7d [PowerPC] convert memmove to milicode call .___memmove64[PR] in 64-bit mode (llvm#167334) [HLSL] Add internal linkage attribute to resources (llvm#166844) AMDGPU: Update test after e95f6fa [bazel] Port llvm#166980: TLI/VectorLibrary refactor (llvm#167354) [libc++] Split macros related to hardening into their own header (llvm#167069) [libc++][NFC] Remove unused imports from generate_feature_test_macro_components.py (llvm#159591) [CIR][NFC] Add test for Complex imag with GUN extension (llvm#167215) [BOLT][AArch64] Add more heuristics on epilogue determination (llvm#167077) RegisterCoalescer: Enable terminal rule by default for AMDGPU (llvm#161621) Revert "[clang] Refactor option-related code from clangDriver into new clangOptions library" (llvm#167348) [libc++][docs] Update to refer to P3355R2 (llvm#167267) ...
|
This change broke a significant number of cases in ffmpeg, across multiple architectures. One way of reproducing is this: $ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang
$ make -j$(nproc)
$ make fate-vsynth1-flvThis produces a binary that crashes for this testcase. On x86_64, the object file where the breakage is, is in Compilation of that particular object file (the one on x86) can be done standalone with https://martin.st/temp/videodsp_init-preproc.c - with If comparing the difference in generated code caused by this patch, I get this: --- out-good.s 2025-11-13 11:30:40.037589941 +0200
+++ out-bad.s 2025-11-13 11:30:33.665583286 +0200
@@ -132,15 +132,13 @@
addq %r10, %r11
movq %rax, %rbx
.LBB1_12: # %if.end29.i
+ xorl %eax, %eax
subq %r8, %r9
negq %r8
- xorl %eax, %eax
- testq %r8, %r8
- cmovleq %rax, %r8
+ cmovbq %rax, %r8
movq %rbx, %r13
negq %r13
- testq %r13, %r13
- cmovleq %rax, %r13
+ cmovbq %rax, %r13
cmpq %r9, %r14
cmovlq %r14, %r9
subq %rbx, %rbp
@@ -374,15 +372,13 @@
addq %r10, %r11
movq %rax, %rbx
.LBB2_12: # %if.end29.i
+ xorl %eax, %eax
subq %r8, %r9
negq %r8
- xorl %eax, %eax
- testq %r8, %r8
- cmovleq %rax, %r8
+ cmovbq %rax, %r8
movq %rbx, %r13
negq %r13
- testq %r13, %r13
- cmovleq %rax, %r13
+ cmovbq %rax, %r13
cmpq %r9, %r14
cmovlq %r14, %r9
subq %rbx, %rbpI would suggest reverting this commit for now. |
…ect usubo.1 a usubo.0)" (#167854) Reverts llvm/llvm-project#161651 due to downstream bad codegen reports
Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures
Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures
Fixes #161036.