[AArch64] Combine and and lsl into ubfiz #118974

c-rhodes · 2024-12-06T14:02:07Z

Fixes #118132.

llvmbot · 2024-12-06T14:02:40Z

@llvm/pr-subscribers-backend-aarch64

Author: Cullen Rhodes (c-rhodes)

Changes

Fixes #118132.

Full diff: https://github.com/llvm/llvm-project/pull/118974.diff

3 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+9)
(modified) llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll (+4-6)
(modified) llvm/test/CodeGen/AArch64/xbfiz.ll (+16)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 7614f6215b803c..9f980615caff5a 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
                    (i64 (i64shift_a        imm0_63:$imm)),
                    (i64 (i64shift_sext_i32 imm0_63:$imm)))>;
 
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),
+          (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+                   (i64 (i64shift_a        imm0_63:$imm)),
+                   (i64 (i64shift_sext_i8 imm0_63:$imm)))>;
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xffff)), (i64 imm0_63:$imm)),
+          (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+                   (i64 (i64shift_a        imm0_63:$imm)),
+                   (i64 (i64shift_sext_i16 imm0_63:$imm)))>;
+
 // sra patterns have an AddedComplexity of 10, so make sure we have a higher
 // AddedComplexity for the following patterns since we want to match sext + sra
 // patterns before we attempt to match a single sra node.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
index 63dcafed2320a0..abc5c0876e80b7 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
@@ -13,11 +13,10 @@ define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {
 ; CHECK0-SDAG-LABEL: halfword:
 ; CHECK0-SDAG:       // %bb.0:
 ; CHECK0-SDAG-NEXT:    stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
-; CHECK0-SDAG-NEXT:    // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT:    ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT:    lsr w8, w1, #9
 ; CHECK0-SDAG-NEXT:    stp x20, x19, [sp, #16] // 16-byte Folded Spill
 ; CHECK0-SDAG-NEXT:    mov x19, x0
-; CHECK0-SDAG-NEXT:    lsl x21, x8, #1
+; CHECK0-SDAG-NEXT:    ubfiz x21, x8, #1, #8
 ; CHECK0-SDAG-NEXT:    ldrh w20, [x0, x21]
 ; CHECK0-SDAG-NEXT:    bl foo
 ; CHECK0-SDAG-NEXT:    mov w0, w20
@@ -231,10 +230,9 @@ define i16 @multi_use_half_word(ptr %ctx, i32 %xor72) {
 ; CHECK0-SDAG-NEXT:    .cfi_offset w21, -24
 ; CHECK0-SDAG-NEXT:    .cfi_offset w22, -32
 ; CHECK0-SDAG-NEXT:    .cfi_offset w30, -48
-; CHECK0-SDAG-NEXT:    // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT:    ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT:    lsr w8, w1, #9
 ; CHECK0-SDAG-NEXT:    mov x19, x0
-; CHECK0-SDAG-NEXT:    lsl x21, x8, #1
+; CHECK0-SDAG-NEXT:    ubfiz x21, x8, #1, #8
 ; CHECK0-SDAG-NEXT:    ldrh w20, [x0, x21]
 ; CHECK0-SDAG-NEXT:    add w22, w20, #1
 ; CHECK0-SDAG-NEXT:    bl foo
diff --git a/llvm/test/CodeGen/AArch64/xbfiz.ll b/llvm/test/CodeGen/AArch64/xbfiz.ll
index b777ddcb7efcc4..05567e34258402 100644
--- a/llvm/test/CodeGen/AArch64/xbfiz.ll
+++ b/llvm/test/CodeGen/AArch64/xbfiz.ll
@@ -69,3 +69,19 @@ define i64 @lsl32_not_ubfiz64(i64 %v) {
   %and = and i64 %shl, 4294967295
   ret i64 %and
 }
+
+define i64 @lsl_zext_i8_i64(i8 %b) {
+; CHECK-LABEL: lsl_zext_i8_i64:
+; CHECK:    ubfiz x0, x0, #1, #8
+  %1 = zext i8 %b to i64
+  %2 = shl i64 %1, 1
+  ret i64 %2
+}
+
+define i64 @lsl_zext_i16_i64(i16 %b) {
+; CHECK-LABEL: lsl_zext_i16_i64:
+; CHECK:    ubfiz x0, x0, #1, #16
+  %1 = zext i16 %b to i64
+  %2 = shl i64 %1, 1
+  ret i64 %2
+}

david-arm · 2024-12-06T16:47:57Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td

@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
                   (i64 (i64shift_a        imm0_63:$imm)),
                   (i64 (i64shift_sext_i32 imm0_63:$imm)))>;

+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),


I'm actually quite surprised that the anyext survived all the way through to isel! I was expecting a DAG combine to simply fold and (i64 (anyext GPR32:$Rn)), 0xff into something like and GPR64:$Rn, 0xff.

It's a shame we haven't done that because it would make this pattern simpler. Is this worth looking into?

I think it needs to get the types correct.

davemgreen · 2024-12-06T17:11:16Z

Most of these probably get canoicalized to and(shift) https://godbolt.org/z/PP6b8PY8x, which we already have lowering for https://godbolt.org/z/c5bPGE74d. It would be nice if this handled other mask sizes too. Most of the time UBFM goes via DAG2DAG, using functions like AArch64DAGToDAGISel::tryBitfieldInsertInZeroOp.

c-rhodes · 2024-12-09T16:42:32Z

Most of these probably get canoicalized to and(shift) https://godbolt.org/z/PP6b8PY8x, which we already have lowering for https://godbolt.org/z/c5bPGE74d. It would be nice if this handled other mask sizes too.

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98

but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Most of the time UBFM goes via DAG2DAG, using functions like AArch64DAGToDAGISel::tryBitfieldInsertInZeroOp.

It's not clear to me if you're suggesting to fix this in the canonicalizer or DAG2DAG, or a mix of both?

davemgreen

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98

but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Yeah sorry. My point is that this adds patterns for shl(and(x, 0xff)) and shl(and(x, 0xffff)), but that does not handle all the other masks that could be used, but would be valid to transform to UBFM. Maybe they don't come up in practice because of the canonicalization that the mid-end does? If DAG isn't canonicalizing shl(and) to and(shl) then I can imagine they could, maybe we should be either trying to canonicalize it in DAG (to capture this case, might cause other problems), or if it is possible to generalize the pattern or use DAG2DAG to handle the shl(and) patterns with any and mask. If not this seems OK.

davemgreen · 2024-12-10T15:09:32Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td

@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
                   (i64 (i64shift_a        imm0_63:$imm)),
                   (i64 (i64shift_sext_i32 imm0_63:$imm)))>;

+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),


I think it needs to get the types correct.

c-rhodes · 2024-12-12T10:17:24Z

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98
but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Yeah sorry. My point is that this adds patterns for shl(and(x, 0xff)) and shl(and(x, 0xffff)), but that does not handle all the other masks that could be used, but would be valid to transform to UBFM. Maybe they don't come up in practice because of the canonicalization that the mid-end does? If DAG isn't canonicalizing shl(and) to and(shl) then I can imagine they could, maybe we should be either trying to canonicalize it in DAG (to capture this case, might cause other problems), or if it is possible to generalize the pattern or use DAG2DAG to handle the shl(and) patterns with any and mask. If not this seems OK.

Thanks I see your point now. Ok so yes as you've mentioned we do have patterns for and(shl), for example this IR is equivalent to the original C example on the issue and is lowered to ubfiz: https://godbolt.org/z/qc7xosPxW

; uint64_t u8(uint8_t x) { return x << 1; }
define i64 @f(i64 %x) {
  %and = shl i64 %x, 1
  %shl = and i64 %and, u0x1fe
  ret i64 %shl
}

and the mid-end can canonicalize the inverse shl(and) -> and(shl): https://godbolt.org/z/7fM53Ydov

define i64 @f(i64 %x) {
  %and = and i64 %x, u0xff
  %shl = shl i64 %and, 1
  ret i64 %shl
}

; opt -passes=instcombine

define i64 @f(i64 %x) {
  %and = shl i64 %x, 1
  %shl = and i64 %and, 510 ; u0x1fe
  ret i64 %shl
}

however the IR pre-isel for the original example is:

define i64 @u8(i8 %0) {
  %2 = zext i8 %0 to i64
  %3 = shl nuw nsw i64 %2, 1
  ret i64 %3
}

the and is introduced during isel so mid-end canonicalizer is no help there of course.

Initial selection DAG: %bb.0 'lsl_zext_i8_i64:'
SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t3: i8 = truncate t2
      t4: i64 = zero_extend t3
    t6: i64 = shl nuw nsw t4, Constant:i64<1>
  t8: ch,glue = CopyToReg t0, Register:i64 $x0, t6
  t9: ch = AArch64ISD::RET_GLUE t8, Register:i64 $x0, t8:1

...

Combining: t4: i64 = zero_extend t3
Creating new node: t10: i64 = any_extend t2
Creating constant: t11: i64 = Constant<255>
Creating new node: t12: i64 = and t10, Constant:i64<255>
 ... into: t12: i64 = and t10, Constant:i64<2

...

Optimized lowered selection DAG: %bb.0 'lsl_zext_i8_i64:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t10: i64 = any_extend t2
      t12: i64 = and t10, Constant:i64<255>
    t6: i64 = shl nuw nsw t12, Constant:i64<1>
  t8: ch,glue = CopyToReg t0, Register:i64 $x0, t6
  t9: ch = AArch64ISD::RET_GLUE t8, Register:i64 $x0, t8:1

So adding a shl(and) -> and(shl) combine in isel seems like a sensible approach. I sat down with @david-arm on Tuesday as well and he had the same insight, so I'll give this a try. It seems preferable to adding more patterns in DAG2DAG.

c-rhodes · 2024-12-13T12:40:36Z

I've added a combine for shl(and) -> and(shl) which does the trick, but there's a few unexpected test changes. I think it needs tightening up a bit, I've tried restricting the combine so it only kicks in if the constants are in a given range but I haven't quite figured it out yet. Just wanted to share an update as I'm away for Christmas after today until Jan and I'm not sure if I'll have figured it out by then.

david-arm

The codegen changes seem like an overall win to me. Nice! I had a few comments.

Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

c-rhodes · 2024-12-13T16:55:27Z

The codegen changes seem like an overall win to me. Nice! I had a few comments.

Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

I didn't no, but I can take a look a that when I revisit this after holidays :)

davemgreen · 2024-12-16T10:07:50Z

Yeah It is looking promising. Thanks.

c-rhodes · 2025-01-07T10:29:21Z

The codegen changes seem like an overall win to me. Nice! I had a few comments.
Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

I didn't no, but I can take a look a that when I revisit this after holidays :)

Moving this to DAGCombiner there are ~180 failures across various backends, as well as a regression in llvm/test/CodeGen/AArch64/swap-compare-operands.ll since the AArch64 ISD nodes in one of the pattern if statements have to be removed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Fixes llvm#118132.

david-arm

LGTM! Thanks for making all the changes. I left a minor comment that you can address before landing the patch!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

alexfh · 2025-01-17T11:24:47Z

We see Clang timeouts (which means compilation time exceeds 900 seconds in this particular case, usually this code compiles in ~20s) when compiling some code for aarch64 after this commit. I'm working on an isolated test case.

alexfh · 2025-01-17T14:44:11Z

Here a reduced test case: https://gcc.godbolt.org/z/K9csK5znc

$ cat r.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-unknown-linux-gnu"

define void @_f(ptr %0, ptr %1, i64 %2) {
  store i64 -2401053089408754003, ptr %1, align 8
  %4 = and i64 %2, -2401053089408754003
  %5 = shl i64 %4, 1
  store i64 %5, ptr %0, align 1
  %6 = lshr i64 %4, 54
  %7 = shl i64 %2, 10
  %8 = and i64 %7, 131072
  %9 = or i64 %8, %6
  store i64 %9, ptr %1, align 1
  ret void
}
$ ./clang-slow -O1 --target=aarch64-unknown-linux-gnu  -mcpu=neoverse-n1 -mtune=generic -c r.ll
<takes forever>

Please fix or revert soon.

c-rhodes · 2025-01-17T15:29:13Z

@alexfh thanks for reporting. Can confirm llc r.ll is hanging and reverting this patch fixes it.

Had a quick look at debug output can see it's stuck in a loop of:

  Combining: t62138: i64 = shl t62137, Constant:i64<1>
  Creating new node: t62139: i64 = shl OpaqueConstant:i64<-2401053089408754003>, Constant:i64<1>
  Creating new node: t62140: i64 = shl t6, Constant:i64<1>
  Creating new node: t62141: i64 = and t62140, t62139
   ... into: t62141: i64 = and t62140, t62139
  
  Legalizing: t62141: i64 = and t62140, t62139
  Legal node: nothing to do
  
  Combining: t62141: i64 = and t62140, t62139
  Creating new node: t62142: i64 = and t6, OpaqueConstant:i64<-2401053089408754003>
  Creating new node: t62143: i64 = shl t62142, Constant:i64<1>
   ... into: t62143: i64 = shl t62142, Constant:i64<1>

first combine is the one I added here and the last is DAGCombiner::visitAND,
specifically: https://github.com/llvm/llvm-project/blob/63b0ab84253f29f1f9b9136a02d589552b29c645/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L7446-L7449 and they keep undoing each other.

Will revert for now whilst I figure out a fix.

This reverts commit f1d5efe.

Reverts #118974

Reverts llvm/llvm-project#118974

c-rhodes requested review from davemgreen and david-arm December 6, 2024 14:02

llvmbot added the backend:AArch64 label Dec 6, 2024

david-arm reviewed Dec 6, 2024

View reviewed changes

davemgreen reviewed Dec 10, 2024

View reviewed changes

david-arm reviewed Dec 13, 2024

View reviewed changes

david-arm reviewed Jan 8, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

c-rhodes added 5 commits January 8, 2025 13:41

[AArch64] Combine and and lsl into ubfiz

c41f665

Fixes llvm#118132.

Move to target DAG-combine

5099321

Exclude more uses of SHL that might be combined

69be811

address comments

fdbe823

canonicalize after legalization

f0cbfdd

c-rhodes force-pushed the combine-and-lsl-into-ubfiz branch from b092477 to f0cbfdd Compare January 8, 2025 14:28

david-arm approved these changes Jan 8, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Show resolved Hide resolved

address comments

ed9338a

c-rhodes merged commit f1d5efe into llvm:main Jan 9, 2025
6 of 8 checks passed

dtcxzyw mentioned this pull request Jan 9, 2025

Update diff January 9th 2025, 1:12:14 pm dtcxzyw/llvm-codegen-benchmark#163

Open

c-rhodes added a commit that referenced this pull request Jan 17, 2025

Revert "[AArch64] Combine and lsl into ubfiz (#118974)"

74673b3

This reverts commit f1d5efe.

c-rhodes mentioned this pull request Jan 17, 2025

Revert "[AArch64] Combine and and lsl into ubfiz" #123356

Merged

c-rhodes added a commit that referenced this pull request Jan 17, 2025

Revert "[AArch64] Combine and and lsl into ubfiz" (#123356)

f719771

Reverts #118974

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 17, 2025

Automerge: Revert "[AArch64] Combine and and lsl into ubfiz" (#123356)

424ed0e

Reverts llvm/llvm-project#118974

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Combine and and lsl into ubfiz #118974

[AArch64] Combine and and lsl into ubfiz #118974

c-rhodes commented Dec 6, 2024

llvmbot commented Dec 6, 2024

david-arm Dec 6, 2024

david-arm Dec 6, 2024

davemgreen Dec 10, 2024

davemgreen commented Dec 6, 2024

c-rhodes commented Dec 9, 2024

davemgreen left a comment

davemgreen Dec 10, 2024

c-rhodes commented Dec 12, 2024

c-rhodes commented Dec 13, 2024

david-arm left a comment

c-rhodes commented Dec 13, 2024

davemgreen commented Dec 16, 2024

c-rhodes commented Jan 7, 2025

david-arm left a comment

alexfh commented Jan 17, 2025 •

edited

Loading

alexfh commented Jan 17, 2025 •

edited

Loading

c-rhodes commented Jan 17, 2025

[AArch64] Combine and and lsl into ubfiz #118974

[AArch64] Combine and and lsl into ubfiz #118974

Conversation

c-rhodes commented Dec 6, 2024

llvmbot commented Dec 6, 2024

david-arm Dec 6, 2024

Choose a reason for hiding this comment

david-arm Dec 6, 2024

Choose a reason for hiding this comment

davemgreen Dec 10, 2024

Choose a reason for hiding this comment

davemgreen commented Dec 6, 2024

c-rhodes commented Dec 9, 2024

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen Dec 10, 2024

Choose a reason for hiding this comment

c-rhodes commented Dec 12, 2024

c-rhodes commented Dec 13, 2024

david-arm left a comment

Choose a reason for hiding this comment

c-rhodes commented Dec 13, 2024

davemgreen commented Dec 16, 2024

c-rhodes commented Jan 7, 2025

david-arm left a comment

Choose a reason for hiding this comment

alexfh commented Jan 17, 2025 • edited Loading

alexfh commented Jan 17, 2025 • edited Loading

c-rhodes commented Jan 17, 2025

alexfh commented Jan 17, 2025 •

edited

Loading

alexfh commented Jan 17, 2025 •

edited

Loading