Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Combine and and lsl into ubfiz #118974

Merged
merged 6 commits into from
Jan 9, 2025

Conversation

c-rhodes
Copy link
Collaborator

@c-rhodes c-rhodes commented Dec 6, 2024

Fixes #118132.

@llvmbot
Copy link
Member

llvmbot commented Dec 6, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Cullen Rhodes (c-rhodes)

Changes

Fixes #118132.


Full diff: https://github.com/llvm/llvm-project/pull/118974.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+9)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll (+4-6)
  • (modified) llvm/test/CodeGen/AArch64/xbfiz.ll (+16)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 7614f6215b803c..9f980615caff5a 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
                    (i64 (i64shift_a        imm0_63:$imm)),
                    (i64 (i64shift_sext_i32 imm0_63:$imm)))>;
 
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),
+          (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+                   (i64 (i64shift_a        imm0_63:$imm)),
+                   (i64 (i64shift_sext_i8 imm0_63:$imm)))>;
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xffff)), (i64 imm0_63:$imm)),
+          (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+                   (i64 (i64shift_a        imm0_63:$imm)),
+                   (i64 (i64shift_sext_i16 imm0_63:$imm)))>;
+
 // sra patterns have an AddedComplexity of 10, so make sure we have a higher
 // AddedComplexity for the following patterns since we want to match sext + sra
 // patterns before we attempt to match a single sra node.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
index 63dcafed2320a0..abc5c0876e80b7 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
@@ -13,11 +13,10 @@ define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {
 ; CHECK0-SDAG-LABEL: halfword:
 ; CHECK0-SDAG:       // %bb.0:
 ; CHECK0-SDAG-NEXT:    stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
-; CHECK0-SDAG-NEXT:    // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT:    ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT:    lsr w8, w1, #9
 ; CHECK0-SDAG-NEXT:    stp x20, x19, [sp, #16] // 16-byte Folded Spill
 ; CHECK0-SDAG-NEXT:    mov x19, x0
-; CHECK0-SDAG-NEXT:    lsl x21, x8, #1
+; CHECK0-SDAG-NEXT:    ubfiz x21, x8, #1, #8
 ; CHECK0-SDAG-NEXT:    ldrh w20, [x0, x21]
 ; CHECK0-SDAG-NEXT:    bl foo
 ; CHECK0-SDAG-NEXT:    mov w0, w20
@@ -231,10 +230,9 @@ define i16 @multi_use_half_word(ptr %ctx, i32 %xor72) {
 ; CHECK0-SDAG-NEXT:    .cfi_offset w21, -24
 ; CHECK0-SDAG-NEXT:    .cfi_offset w22, -32
 ; CHECK0-SDAG-NEXT:    .cfi_offset w30, -48
-; CHECK0-SDAG-NEXT:    // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT:    ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT:    lsr w8, w1, #9
 ; CHECK0-SDAG-NEXT:    mov x19, x0
-; CHECK0-SDAG-NEXT:    lsl x21, x8, #1
+; CHECK0-SDAG-NEXT:    ubfiz x21, x8, #1, #8
 ; CHECK0-SDAG-NEXT:    ldrh w20, [x0, x21]
 ; CHECK0-SDAG-NEXT:    add w22, w20, #1
 ; CHECK0-SDAG-NEXT:    bl foo
diff --git a/llvm/test/CodeGen/AArch64/xbfiz.ll b/llvm/test/CodeGen/AArch64/xbfiz.ll
index b777ddcb7efcc4..05567e34258402 100644
--- a/llvm/test/CodeGen/AArch64/xbfiz.ll
+++ b/llvm/test/CodeGen/AArch64/xbfiz.ll
@@ -69,3 +69,19 @@ define i64 @lsl32_not_ubfiz64(i64 %v) {
   %and = and i64 %shl, 4294967295
   ret i64 %and
 }
+
+define i64 @lsl_zext_i8_i64(i8 %b) {
+; CHECK-LABEL: lsl_zext_i8_i64:
+; CHECK:    ubfiz x0, x0, #1, #8
+  %1 = zext i8 %b to i64
+  %2 = shl i64 %1, 1
+  ret i64 %2
+}
+
+define i64 @lsl_zext_i16_i64(i16 %b) {
+; CHECK-LABEL: lsl_zext_i16_i64:
+; CHECK:    ubfiz x0, x0, #1, #16
+  %1 = zext i16 %b to i64
+  %2 = shl i64 %1, 1
+  ret i64 %2
+}

@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
(i64 (i64shift_a imm0_63:$imm)),
(i64 (i64shift_sext_i32 imm0_63:$imm)))>;

def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually quite surprised that the anyext survived all the way through to isel! I was expecting a DAG combine to simply fold and (i64 (anyext GPR32:$Rn)), 0xff into something like and GPR64:$Rn, 0xff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame we haven't done that because it would make this pattern simpler. Is this worth looking into?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to get the types correct.

@davemgreen
Copy link
Collaborator

Most of these probably get canoicalized to and(shift) https://godbolt.org/z/PP6b8PY8x, which we already have lowering for https://godbolt.org/z/c5bPGE74d. It would be nice if this handled other mask sizes too. Most of the time UBFM goes via DAG2DAG, using functions like AArch64DAGToDAGISel::tryBitfieldInsertInZeroOp.

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented Dec 9, 2024

Most of these probably get canoicalized to and(shift) https://godbolt.org/z/PP6b8PY8x, which we already have lowering for https://godbolt.org/z/c5bPGE74d. It would be nice if this handled other mask sizes too.

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98

but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Most of the time UBFM goes via DAG2DAG, using functions like AArch64DAGToDAGISel::tryBitfieldInsertInZeroOp.

It's not clear to me if you're suggesting to fix this in the canonicalizer or DAG2DAG, or a mix of both?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98

but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Yeah sorry. My point is that this adds patterns for shl(and(x, 0xff)) and shl(and(x, 0xffff)), but that does not handle all the other masks that could be used, but would be valid to transform to UBFM. Maybe they don't come up in practice because of the canonicalization that the mid-end does? If DAG isn't canonicalizing shl(and) to and(shl) then I can imagine they could, maybe we should be either trying to canonicalize it in DAG (to capture this case, might cause other problems), or if it is possible to generalize the pattern or use DAG2DAG to handle the shl(and) patterns with any and mask. If not this seems OK.

@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
(i64 (i64shift_a imm0_63:$imm)),
(i64 (i64shift_sext_i32 imm0_63:$imm)))>;

def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to get the types correct.

@c-rhodes
Copy link
Collaborator Author

Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98
but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no ZERO_EXTEND_INREG node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?

Yeah sorry. My point is that this adds patterns for shl(and(x, 0xff)) and shl(and(x, 0xffff)), but that does not handle all the other masks that could be used, but would be valid to transform to UBFM. Maybe they don't come up in practice because of the canonicalization that the mid-end does? If DAG isn't canonicalizing shl(and) to and(shl) then I can imagine they could, maybe we should be either trying to canonicalize it in DAG (to capture this case, might cause other problems), or if it is possible to generalize the pattern or use DAG2DAG to handle the shl(and) patterns with any and mask. If not this seems OK.

Thanks I see your point now. Ok so yes as you've mentioned we do have patterns for and(shl), for example this IR is equivalent to the original C example on the issue and is lowered to ubfiz: https://godbolt.org/z/qc7xosPxW

; uint64_t u8(uint8_t x) { return x << 1; }
define i64 @f(i64 %x) {
  %and = shl i64 %x, 1
  %shl = and i64 %and, u0x1fe
  ret i64 %shl
}

and the mid-end can canonicalize the inverse shl(and) -> and(shl): https://godbolt.org/z/7fM53Ydov

define i64 @f(i64 %x) {
  %and = and i64 %x, u0xff
  %shl = shl i64 %and, 1
  ret i64 %shl
}

; opt -passes=instcombine

define i64 @f(i64 %x) {
  %and = shl i64 %x, 1
  %shl = and i64 %and, 510 ; u0x1fe
  ret i64 %shl
}

however the IR pre-isel for the original example is:

define i64 @u8(i8 %0) {
  %2 = zext i8 %0 to i64
  %3 = shl nuw nsw i64 %2, 1
  ret i64 %3
}

the and is introduced during isel so mid-end canonicalizer is no help there of course.

Initial selection DAG: %bb.0 'lsl_zext_i8_i64:'
SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t3: i8 = truncate t2
      t4: i64 = zero_extend t3
    t6: i64 = shl nuw nsw t4, Constant:i64<1>
  t8: ch,glue = CopyToReg t0, Register:i64 $x0, t6
  t9: ch = AArch64ISD::RET_GLUE t8, Register:i64 $x0, t8:1

...

Combining: t4: i64 = zero_extend t3
Creating new node: t10: i64 = any_extend t2
Creating constant: t11: i64 = Constant<255>
Creating new node: t12: i64 = and t10, Constant:i64<255>
 ... into: t12: i64 = and t10, Constant:i64<2

...

Optimized lowered selection DAG: %bb.0 'lsl_zext_i8_i64:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t10: i64 = any_extend t2
      t12: i64 = and t10, Constant:i64<255>
    t6: i64 = shl nuw nsw t12, Constant:i64<1>
  t8: ch,glue = CopyToReg t0, Register:i64 $x0, t6
  t9: ch = AArch64ISD::RET_GLUE t8, Register:i64 $x0, t8:1

So adding a shl(and) -> and(shl) combine in isel seems like a sensible approach. I sat down with @david-arm on Tuesday as well and he had the same insight, so I'll give this a try. It seems preferable to adding more patterns in DAG2DAG.

@c-rhodes
Copy link
Collaborator Author

I've added a combine for shl(and) -> and(shl) which does the trick, but there's a few unexpected test changes. I think it needs tightening up a bit, I've tried restricting the combine so it only kicks in if the constants are in a given range but I haven't quite figured it out yet. Just wanted to share an update as I'm away for Christmas after today until Jan and I'm not sure if I'll have figured it out by then.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codegen changes seem like an overall win to me. Nice! I had a few comments.

Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

@c-rhodes
Copy link
Collaborator Author

The codegen changes seem like an overall win to me. Nice! I had a few comments.

Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

I didn't no, but I can take a look a that when I revisit this after holidays :)

@davemgreen
Copy link
Collaborator

Yeah It is looking promising. Thanks.

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented Jan 7, 2025

The codegen changes seem like an overall win to me. Nice! I had a few comments.
Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!

I didn't no, but I can take a look a that when I revisit this after holidays :)

Moving this to DAGCombiner there are ~180 failures across various backends, as well as a regression in llvm/test/CodeGen/AArch64/swap-compare-operands.ll since the AArch64 ISD nodes in one of the pattern if statements have to be removed.

@c-rhodes c-rhodes force-pushed the combine-and-lsl-into-ubfiz branch from b092477 to f0cbfdd Compare January 8, 2025 14:28
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making all the changes. I left a minor comment that you can address before landing the patch!

@c-rhodes c-rhodes merged commit f1d5efe into llvm:main Jan 9, 2025
6 of 8 checks passed
@alexfh
Copy link
Contributor

alexfh commented Jan 17, 2025

We see Clang timeouts (which means compilation time exceeds 900 seconds in this particular case, usually this code compiles in ~20s) when compiling some code for aarch64 after this commit. I'm working on an isolated test case.

@alexfh
Copy link
Contributor

alexfh commented Jan 17, 2025

Here a reduced test case: https://gcc.godbolt.org/z/K9csK5znc

$ cat r.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-unknown-linux-gnu"

define void @_f(ptr %0, ptr %1, i64 %2) {
  store i64 -2401053089408754003, ptr %1, align 8
  %4 = and i64 %2, -2401053089408754003
  %5 = shl i64 %4, 1
  store i64 %5, ptr %0, align 1
  %6 = lshr i64 %4, 54
  %7 = shl i64 %2, 10
  %8 = and i64 %7, 131072
  %9 = or i64 %8, %6
  store i64 %9, ptr %1, align 1
  ret void
}
$ ./clang-slow -O1 --target=aarch64-unknown-linux-gnu  -mcpu=neoverse-n1 -mtune=generic -c r.ll
<takes forever>

Please fix or revert soon.

@c-rhodes
Copy link
Collaborator Author

@alexfh thanks for reporting. Can confirm llc r.ll is hanging and reverting this patch fixes it.

Had a quick look at debug output can see it's stuck in a loop of:

  Combining: t62138: i64 = shl t62137, Constant:i64<1>
  Creating new node: t62139: i64 = shl OpaqueConstant:i64<-2401053089408754003>, Constant:i64<1>
  Creating new node: t62140: i64 = shl t6, Constant:i64<1>
  Creating new node: t62141: i64 = and t62140, t62139
   ... into: t62141: i64 = and t62140, t62139
  
  Legalizing: t62141: i64 = and t62140, t62139
  Legal node: nothing to do
  
  Combining: t62141: i64 = and t62140, t62139
  Creating new node: t62142: i64 = and t6, OpaqueConstant:i64<-2401053089408754003>
  Creating new node: t62143: i64 = shl t62142, Constant:i64<1>
   ... into: t62143: i64 = shl t62142, Constant:i64<1>

first combine is the one I added here and the last is DAGCombiner::visitAND,
specifically: https://github.com/llvm/llvm-project/blob/63b0ab84253f29f1f9b9136a02d589552b29c645/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L7446-L7449 and they keep undoing each other.

Will revert for now whilst I figure out a fix.

c-rhodes added a commit that referenced this pull request Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] Combine and and lsl into ubfiz
5 participants