-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Combine and and lsl into ubfiz #118974
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Cullen Rhodes (c-rhodes) ChangesFixes #118132. Full diff: https://github.com/llvm/llvm-project/pull/118974.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 7614f6215b803c..9f980615caff5a 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
(i64 (i64shift_a imm0_63:$imm)),
(i64 (i64shift_sext_i32 imm0_63:$imm)))>;
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)),
+ (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+ (i64 (i64shift_a imm0_63:$imm)),
+ (i64 (i64shift_sext_i8 imm0_63:$imm)))>;
+def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xffff)), (i64 imm0_63:$imm)),
+ (UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+ (i64 (i64shift_a imm0_63:$imm)),
+ (i64 (i64shift_sext_i16 imm0_63:$imm)))>;
+
// sra patterns have an AddedComplexity of 10, so make sure we have a higher
// AddedComplexity for the following patterns since we want to match sext + sra
// patterns before we attempt to match a single sra node.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
index 63dcafed2320a0..abc5c0876e80b7 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
@@ -13,11 +13,10 @@ define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-SDAG-LABEL: halfword:
; CHECK0-SDAG: // %bb.0:
; CHECK0-SDAG-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
-; CHECK0-SDAG-NEXT: // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT: ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT: lsr w8, w1, #9
; CHECK0-SDAG-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-SDAG-NEXT: mov x19, x0
-; CHECK0-SDAG-NEXT: lsl x21, x8, #1
+; CHECK0-SDAG-NEXT: ubfiz x21, x8, #1, #8
; CHECK0-SDAG-NEXT: ldrh w20, [x0, x21]
; CHECK0-SDAG-NEXT: bl foo
; CHECK0-SDAG-NEXT: mov w0, w20
@@ -231,10 +230,9 @@ define i16 @multi_use_half_word(ptr %ctx, i32 %xor72) {
; CHECK0-SDAG-NEXT: .cfi_offset w21, -24
; CHECK0-SDAG-NEXT: .cfi_offset w22, -32
; CHECK0-SDAG-NEXT: .cfi_offset w30, -48
-; CHECK0-SDAG-NEXT: // kill: def $w1 killed $w1 def $x1
-; CHECK0-SDAG-NEXT: ubfx x8, x1, #9, #8
+; CHECK0-SDAG-NEXT: lsr w8, w1, #9
; CHECK0-SDAG-NEXT: mov x19, x0
-; CHECK0-SDAG-NEXT: lsl x21, x8, #1
+; CHECK0-SDAG-NEXT: ubfiz x21, x8, #1, #8
; CHECK0-SDAG-NEXT: ldrh w20, [x0, x21]
; CHECK0-SDAG-NEXT: add w22, w20, #1
; CHECK0-SDAG-NEXT: bl foo
diff --git a/llvm/test/CodeGen/AArch64/xbfiz.ll b/llvm/test/CodeGen/AArch64/xbfiz.ll
index b777ddcb7efcc4..05567e34258402 100644
--- a/llvm/test/CodeGen/AArch64/xbfiz.ll
+++ b/llvm/test/CodeGen/AArch64/xbfiz.ll
@@ -69,3 +69,19 @@ define i64 @lsl32_not_ubfiz64(i64 %v) {
%and = and i64 %shl, 4294967295
ret i64 %and
}
+
+define i64 @lsl_zext_i8_i64(i8 %b) {
+; CHECK-LABEL: lsl_zext_i8_i64:
+; CHECK: ubfiz x0, x0, #1, #8
+ %1 = zext i8 %b to i64
+ %2 = shl i64 %1, 1
+ ret i64 %2
+}
+
+define i64 @lsl_zext_i16_i64(i16 %b) {
+; CHECK-LABEL: lsl_zext_i16_i64:
+; CHECK: ubfiz x0, x0, #1, #16
+ %1 = zext i16 %b to i64
+ %2 = shl i64 %1, 1
+ ret i64 %2
+}
|
@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)), | |||
(i64 (i64shift_a imm0_63:$imm)), | |||
(i64 (i64shift_sext_i32 imm0_63:$imm)))>; | |||
|
|||
def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually quite surprised that the anyext survived all the way through to isel! I was expecting a DAG combine to simply fold and (i64 (anyext GPR32:$Rn)), 0xff
into something like and GPR64:$Rn, 0xff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we haven't done that because it would make this pattern simpler. Is this worth looking into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to get the types correct.
Most of these probably get canoicalized to and(shift) https://godbolt.org/z/PP6b8PY8x, which we already have lowering for https://godbolt.org/z/c5bPGE74d. It would be nice if this handled other mask sizes too. Most of the time UBFM goes via DAG2DAG, using functions like AArch64DAGToDAGISel::tryBitfieldInsertInZeroOp. |
Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98 but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no
It's not clear to me if you're suggesting to fix this in the canonicalizer or DAG2DAG, or a mix of both? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, what should handle other mask sizes? In the second link you sent inst-combine does canonicalize to the same IR: https://godbolt.org/z/qqqosjP98
but this musn't be happening in compilation pipeline? Also, your example is and/shl, whereas the original example is shl(zext). I noticed in SDAG there's no
ZERO_EXTEND_INREG
node as an AND with constant is used instead, are you using and/shl as that's the canonical form or something?
Yeah sorry. My point is that this adds patterns for shl(and(x, 0xff))
and shl(and(x, 0xffff))
, but that does not handle all the other masks that could be used, but would be valid to transform to UBFM. Maybe they don't come up in practice because of the canonicalization that the mid-end does? If DAG isn't canonicalizing shl(and)
to and(shl)
then I can imagine they could, maybe we should be either trying to canonicalize it in DAG (to capture this case, might cause other problems), or if it is possible to generalize the pattern or use DAG2DAG to handle the shl(and)
patterns with any and mask. If not this seems OK.
@@ -8989,6 +8989,15 @@ def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)), | |||
(i64 (i64shift_a imm0_63:$imm)), | |||
(i64 (i64shift_sext_i32 imm0_63:$imm)))>; | |||
|
|||
def : Pat<(shl (i64 (and (i64 (anyext GPR32:$Rn)), 0xff)), (i64 imm0_63:$imm)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to get the types correct.
Thanks I see your point now. Ok so yes as you've mentioned we do have patterns for
and the mid-end can canonicalize the inverse
however the IR pre-isel for the original example is:
the
So adding a |
I've added a combine for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codegen changes seem like an overall win to me. Nice! I had a few comments.
Just a general question - did you try seeing if the SHL combine could be done in the general DAGCombiner to get the same effect? It seems like a sensible canonicalisation to do, mirroring what InstCombine does at the IR level. I just wondered if you saw regressions in other targets when doing that, or if something else got in the way? Thanks!
I didn't no, but I can take a look a that when I revisit this after holidays :) |
Yeah It is looking promising. Thanks. |
Moving this to DAGCombiner there are ~180 failures across various backends, as well as a regression in |
b092477
to
f0cbfdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for making all the changes. I left a minor comment that you can address before landing the patch!
We see Clang timeouts (which means compilation time exceeds 900 seconds in this particular case, usually this code compiles in ~20s) when compiling some code for aarch64 after this commit. I'm working on an isolated test case. |
Here a reduced test case: https://gcc.godbolt.org/z/K9csK5znc
Please fix or revert soon. |
@alexfh thanks for reporting. Can confirm Had a quick look at debug output can see it's stuck in a loop of:
first combine is the one I added here and the last is Will revert for now whilst I figure out a fix. |
Fixes #118132.