-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105
Conversation
This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take. The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take. The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future. Full diff: https://github.com/llvm/llvm-project/pull/87105.diff 5 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 564fda674317f4..22bd4b8c7230a9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20393,8 +20393,8 @@ bool RISCVTargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT,
(1 - Imm).isPowerOf2() || (-1 - Imm).isPowerOf2())
return true;
- // Optimize the MUL to (SH*ADD x, (SLLI x, bits)) if Imm is not simm12.
- if (Subtarget.hasStdExtZba() && !Imm.isSignedIntN(12) &&
+ // Optimize the MUL to (SH*ADD x, (SLLI x, bits)).
+ if (Subtarget.hasStdExtZba() &&
((Imm - 2).isPowerOf2() || (Imm - 4).isPowerOf2() ||
(Imm - 8).isPowerOf2()))
return true;
diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
index 48fa69e1045656..931edacfbfff24 100644
--- a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
+++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
@@ -551,8 +551,9 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
; RV32IMB-NEXT: sh3add a1, a1, a2
; RV32IMB-NEXT: sh1add a0, a0, a0
; RV32IMB-NEXT: slli a2, a0, 3
-; RV32IMB-NEXT: addi a0, a2, 2047
-; RV32IMB-NEXT: addi a0, a0, 1
+; RV32IMB-NEXT: li a3, 1
+; RV32IMB-NEXT: slli a3, a3, 11
+; RV32IMB-NEXT: sh3add a0, a0, a3
; RV32IMB-NEXT: sltu a2, a0, a2
; RV32IMB-NEXT: add a1, a1, a2
; RV32IMB-NEXT: ret
@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
; RV64IMB: # %bb.0:
; RV64IMB-NEXT: addi a0, a0, 86
; RV64IMB-NEXT: sh1add a0, a0, a0
-; RV64IMB-NEXT: li a1, -16
-; RV64IMB-NEXT: sh3add a0, a0, a1
+; RV64IMB-NEXT: slli a0, a0, 3
+; RV64IMB-NEXT: addi a0, a0, -16
; RV64IMB-NEXT: ret
%tmp0 = mul i64 %x, 24
%tmp1 = add i64 %tmp0, 2048
@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) {
define i1 @pr53831(i32 %x) {
; RV32IMB-LABEL: pr53831:
; RV32IMB: # %bb.0:
-; RV32IMB-NEXT: li a0, 0
+; RV32IMB-NEXT: addi a1, a0, 1
+; RV32IMB-NEXT: sh1add a1, a1, a1
+; RV32IMB-NEXT: slli a1, a1, 3
+; RV32IMB-NEXT: addi a1, a1, 1
+; RV32IMB-NEXT: sh1add a0, a0, a0
+; RV32IMB-NEXT: li a2, 1
+; RV32IMB-NEXT: slli a2, a2, 11
+; RV32IMB-NEXT: sh3add a0, a0, a2
+; RV32IMB-NEXT: xor a0, a0, a1
+; RV32IMB-NEXT: seqz a0, a0
; RV32IMB-NEXT: ret
;
; RV64IMB-LABEL: pr53831:
; RV64IMB: # %bb.0:
-; RV64IMB-NEXT: li a0, 0
+; RV64IMB-NEXT: addi a1, a0, 1
+; RV64IMB-NEXT: sh1add a1, a1, a1
+; RV64IMB-NEXT: slliw a1, a1, 3
+; RV64IMB-NEXT: addi a1, a1, 1
+; RV64IMB-NEXT: sh1add a0, a0, a0
+; RV64IMB-NEXT: li a2, 1
+; RV64IMB-NEXT: slli a2, a2, 11
+; RV64IMB-NEXT: sh3add a0, a0, a2
+; RV64IMB-NEXT: sext.w a0, a0
+; RV64IMB-NEXT: xor a0, a0, a1
+; RV64IMB-NEXT: seqz a0, a0
; RV64IMB-NEXT: ret
%tmp0 = add i32 %x, 1
%tmp1 = mul i32 %tmp0, 24
diff --git a/llvm/test/CodeGen/RISCV/rv32zba.ll b/llvm/test/CodeGen/RISCV/rv32zba.ll
index 0908a393338c50..cc632a09c8054b 100644
--- a/llvm/test/CodeGen/RISCV/rv32zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv32zba.ll
@@ -271,31 +271,49 @@ define i32 @mul288(i32 %a) {
}
define i32 @mul258(i32 %a) {
-; CHECK-LABEL: mul258:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 258
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV32I-LABEL: mul258:
+; RV32I: # %bb.0:
+; RV32I-NEXT: li a1, 258
+; RV32I-NEXT: mul a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBA-LABEL: mul258:
+; RV32ZBA: # %bb.0:
+; RV32ZBA-NEXT: slli a1, a0, 8
+; RV32ZBA-NEXT: sh1add a0, a0, a1
+; RV32ZBA-NEXT: ret
%c = mul i32 %a, 258
ret i32 %c
}
define i32 @mul260(i32 %a) {
-; CHECK-LABEL: mul260:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 260
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV32I-LABEL: mul260:
+; RV32I: # %bb.0:
+; RV32I-NEXT: li a1, 260
+; RV32I-NEXT: mul a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBA-LABEL: mul260:
+; RV32ZBA: # %bb.0:
+; RV32ZBA-NEXT: slli a1, a0, 8
+; RV32ZBA-NEXT: sh2add a0, a0, a1
+; RV32ZBA-NEXT: ret
%c = mul i32 %a, 260
ret i32 %c
}
define i32 @mul264(i32 %a) {
-; CHECK-LABEL: mul264:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 264
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV32I-LABEL: mul264:
+; RV32I: # %bb.0:
+; RV32I-NEXT: li a1, 264
+; RV32I-NEXT: mul a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBA-LABEL: mul264:
+; RV32ZBA: # %bb.0:
+; RV32ZBA-NEXT: slli a1, a0, 8
+; RV32ZBA-NEXT: sh3add a0, a0, a1
+; RV32ZBA-NEXT: ret
%c = mul i32 %a, 264
ret i32 %c
}
diff --git a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
index 90cfb1fdcb779f..ee9b73ca82f213 100644
--- a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
@@ -811,31 +811,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
}
define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 258
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul258:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 258
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh1add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 258
ret i64 %c
}
define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 260
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul260:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 260
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh2add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 260
ret i64 %c
}
define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 264
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul264:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 264
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh3add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 264
ret i64 %c
}
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index d9d83633a8537f..e25ad50ac4c1b4 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -762,31 +762,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
}
define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 258
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul258:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 258
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh1add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 258
ret i64 %c
}
define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 260
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul260:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 260
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh2add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 260
ret i64 %c
}
define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK: # %bb.0:
-; CHECK-NEXT: li a1, 264
-; CHECK-NEXT: mul a0, a0, a1
-; CHECK-NEXT: ret
+; RV64I-LABEL: mul264:
+; RV64I: # %bb.0:
+; RV64I-NEXT: li a1, 264
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA: # %bb.0:
+; RV64ZBA-NEXT: slli a1, a0, 8
+; RV64ZBA-NEXT: sh3add a0, a0, a1
+; RV64ZBA-NEXT: ret
%c = mul i64 %a, 264
ret i64 %c
}
|
For the record, I can also implement this as the dag combine expansion if anyone wants. I'd actually implemented that originally before figuring out the generic code could handle this case at all. |
; RV32IMB-NEXT: addi a0, a2, 2047 | ||
; RV32IMB-NEXT: addi a0, a0, 1 | ||
; RV32IMB-NEXT: li a3, 1 | ||
; RV32IMB-NEXT: slli a3, a3, 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we bypass the early shift here, so this sh3add match is actually an improvement. Note that with zbs, the immediate materialization becomes a single bseti.
@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) { | |||
; RV64IMB: # %bb.0: | |||
; RV64IMB-NEXT: addi a0, a0, 86 | |||
; RV64IMB-NEXT: sh1add a0, a0, a0 | |||
; RV64IMB-NEXT: li a1, -16 | |||
; RV64IMB-NEXT: sh3add a0, a0, a1 | |||
; RV64IMB-NEXT: slli a0, a0, 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a mild regression, but I think only indirectly related to this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we end up with this instead of (x*3+256)*8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be the result of transformAddImmMulImm. I agree that looks like a questionable transform.
@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) { | |||
define i1 @pr53831(i32 %x) { | |||
; RV32IMB-LABEL: pr53831: | |||
; RV32IMB: # %bb.0: | |||
; RV32IMB-NEXT: li a0, 0 | |||
; RV32IMB-NEXT: addi a1, a0, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is checking for an infinite combine loop, but this result is the missed optimization I mentioned in the review summary.
This expansion is directly inspired by the analogous code in the x86 backend for LEA. shXadd and (this sub-case of) LEA are largely equivalent. This is an alternative to llvm#87105. This expansion is also supported via the decomposeMulByConstant callback, but restricted because of interactions with other combines since that code runs before legalization. As discussed in the other review, my original plan had been to support post legalization expansion through the same interface, but that ended up being more complicated than seems justified. Instead, lets go ahead and do the general expansion post-legalize. Other targets use the combine approach, and matching that structure makes it easier for us to adapt ideas from other targets to RISCV.
I dug into whether it was easy to extend this interface to run post-legalize, and ran into more complexity than I expected. It turns out that we're relying on this running pre-legalize to catch some cases on illegal types. There are possible mul expansions which could handle these after legalization, but they aren't currently implemented. I decided the lift required wasn't really worth it, and have an alternate patch for the same change as a DAGCombine here: #88524 |
This expansion is directly inspired by the analogous code in the x86 backend for LEA. shXadd and (this sub-case of) LEA are largely equivalent. This is an alternative to #87105. This expansion is also supported via the decomposeMulByConstant callback, but restricted because of interactions with other combines since that code runs before legalization. As discussed in the other review, my original plan had been to support post legalization expansion through the same interface, but that ended up being more complicated than seems justified. Instead, lets go ahead and do the general expansion post-legalize. Other targets use the combine approach, and matching that structure makes it easier for us to adapt ideas from other targets to RISCV.
Alternative patch landed. |
This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take.
The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.