[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

preames · 2024-03-29T19:01:15Z

This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take.

The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.

This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take. The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.

llvmbot · 2024-03-29T19:01:45Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take.

The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.

Full diff: https://github.com/llvm/llvm-project/pull/87105.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-2)
(modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+26-6)
(modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+33-15)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 564fda674317f4..22bd4b8c7230a9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20393,8 +20393,8 @@ bool RISCVTargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT,
         (1 - Imm).isPowerOf2() || (-1 - Imm).isPowerOf2())
       return true;
 
-    // Optimize the MUL to (SH*ADD x, (SLLI x, bits)) if Imm is not simm12.
-    if (Subtarget.hasStdExtZba() && !Imm.isSignedIntN(12) &&
+    // Optimize the MUL to (SH*ADD x, (SLLI x, bits)).
+    if (Subtarget.hasStdExtZba() &&
         ((Imm - 2).isPowerOf2() || (Imm - 4).isPowerOf2() ||
          (Imm - 8).isPowerOf2()))
       return true;
diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
index 48fa69e1045656..931edacfbfff24 100644
--- a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
+++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
@@ -551,8 +551,9 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV32IMB-NEXT:    sh3add a1, a1, a2
 ; RV32IMB-NEXT:    sh1add a0, a0, a0
 ; RV32IMB-NEXT:    slli a2, a0, 3
-; RV32IMB-NEXT:    addi a0, a2, 2047
-; RV32IMB-NEXT:    addi a0, a0, 1
+; RV32IMB-NEXT:    li a3, 1
+; RV32IMB-NEXT:    slli a3, a3, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a3
 ; RV32IMB-NEXT:    sltu a2, a0, a2
 ; RV32IMB-NEXT:    add a1, a1, a2
 ; RV32IMB-NEXT:    ret
@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    addi a0, a0, 86
 ; RV64IMB-NEXT:    sh1add a0, a0, a0
-; RV64IMB-NEXT:    li a1, -16
-; RV64IMB-NEXT:    sh3add a0, a0, a1
+; RV64IMB-NEXT:    slli a0, a0, 3
+; RV64IMB-NEXT:    addi a0, a0, -16
 ; RV64IMB-NEXT:    ret
   %tmp0 = mul i64 %x, 24
   %tmp1 = add i64 %tmp0, 2048
@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) {
 define i1 @pr53831(i32 %x) {
 ; RV32IMB-LABEL: pr53831:
 ; RV32IMB:       # %bb.0:
-; RV32IMB-NEXT:    li a0, 0
+; RV32IMB-NEXT:    addi a1, a0, 1
+; RV32IMB-NEXT:    sh1add a1, a1, a1
+; RV32IMB-NEXT:    slli a1, a1, 3
+; RV32IMB-NEXT:    addi a1, a1, 1
+; RV32IMB-NEXT:    sh1add a0, a0, a0
+; RV32IMB-NEXT:    li a2, 1
+; RV32IMB-NEXT:    slli a2, a2, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a2
+; RV32IMB-NEXT:    xor a0, a0, a1
+; RV32IMB-NEXT:    seqz a0, a0
 ; RV32IMB-NEXT:    ret
 ;
 ; RV64IMB-LABEL: pr53831:
 ; RV64IMB:       # %bb.0:
-; RV64IMB-NEXT:    li a0, 0
+; RV64IMB-NEXT:    addi a1, a0, 1
+; RV64IMB-NEXT:    sh1add a1, a1, a1
+; RV64IMB-NEXT:    slliw a1, a1, 3
+; RV64IMB-NEXT:    addi a1, a1, 1
+; RV64IMB-NEXT:    sh1add a0, a0, a0
+; RV64IMB-NEXT:    li a2, 1
+; RV64IMB-NEXT:    slli a2, a2, 11
+; RV64IMB-NEXT:    sh3add a0, a0, a2
+; RV64IMB-NEXT:    sext.w a0, a0
+; RV64IMB-NEXT:    xor a0, a0, a1
+; RV64IMB-NEXT:    seqz a0, a0
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i32 %x, 1
   %tmp1 = mul i32 %tmp0, 24
diff --git a/llvm/test/CodeGen/RISCV/rv32zba.ll b/llvm/test/CodeGen/RISCV/rv32zba.ll
index 0908a393338c50..cc632a09c8054b 100644
--- a/llvm/test/CodeGen/RISCV/rv32zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv32zba.ll
@@ -271,31 +271,49 @@ define i32 @mul288(i32 %a) {
 }
 
 define i32 @mul258(i32 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul258:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 258
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul258:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh1add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 258
   ret i32 %c
 }
 
 define i32 @mul260(i32 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul260:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 260
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul260:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh2add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 260
   ret i32 %c
 }
 
 define i32 @mul264(i32 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul264:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 264
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul264:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh3add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 264
   ret i32 %c
 }
diff --git a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
index 90cfb1fdcb779f..ee9b73ca82f213 100644
--- a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
@@ -811,31 +811,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
 }
 
 define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul258:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 258
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh1add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 258
   ret i64 %c
 }
 
 define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul260:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 260
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh2add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 260
   ret i64 %c
 }
 
 define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul264:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 264
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh3add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 264
   ret i64 %c
 }
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index d9d83633a8537f..e25ad50ac4c1b4 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -762,31 +762,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
 }
 
 define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul258:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 258
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh1add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 258
   ret i64 %c
 }
 
 define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul260:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 260
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh2add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 260
   ret i64 %c
 }
 
 define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul264:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 264
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh3add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 264
   ret i64 %c
 }

preames · 2024-03-29T19:01:57Z

For the record, I can also implement this as the dag combine expansion if anyone wants. I'd actually implemented that originally before figuring out the generic code could handle this case at all.

preames · 2024-03-29T19:02:53Z

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

-; RV32IMB-NEXT:    addi a0, a2, 2047
-; RV32IMB-NEXT:    addi a0, a0, 1
+; RV32IMB-NEXT:    li a3, 1
+; RV32IMB-NEXT:    slli a3, a3, 11


Note that we bypass the early shift here, so this sh3add match is actually an improvement. Note that with zbs, the immediate materialization becomes a single bseti.

preames · 2024-03-29T19:03:36Z

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    addi a0, a0, 86
 ; RV64IMB-NEXT:    sh1add a0, a0, a0
-; RV64IMB-NEXT:    li a1, -16
-; RV64IMB-NEXT:    sh3add a0, a0, a1
+; RV64IMB-NEXT:    slli a0, a0, 3


This seems like a mild regression, but I think only indirectly related to this patch.

Why do we end up with this instead of (x*3+256)*8?

This looks to be the result of transformAddImmMulImm. I agree that looks like a questionable transform.

preames · 2024-03-29T19:04:15Z

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) {
 define i1 @pr53831(i32 %x) {
 ; RV32IMB-LABEL: pr53831:
 ; RV32IMB:       # %bb.0:
-; RV32IMB-NEXT:    li a0, 0
+; RV32IMB-NEXT:    addi a1, a0, 1


This test is checking for an infinite combine loop, but this result is the missed optimization I mentioned in the review summary.

This expansion is directly inspired by the analogous code in the x86 backend for LEA. shXadd and (this sub-case of) LEA are largely equivalent. This is an alternative to llvm#87105. This expansion is also supported via the decomposeMulByConstant callback, but restricted because of interactions with other combines since that code runs before legalization. As discussed in the other review, my original plan had been to support post legalization expansion through the same interface, but that ended up being more complicated than seems justified. Instead, lets go ahead and do the general expansion post-legalize. Other targets use the combine approach, and matching that structure makes it easier for us to adapt ideas from other targets to RISCV.

preames · 2024-04-12T15:20:35Z

I dug into whether it was easy to extend this interface to run post-legalize, and ran into more complexity than I expected. It turns out that we're relying on this running pre-legalize to catch some cases on illegal types. There are possible mul expansions which could handle these after legalization, but they aren't currently implemented.

I decided the lift required wasn't really worth it, and have an alternate patch for the same change as a DAGCombine here: #88524

This expansion is directly inspired by the analogous code in the x86 backend for LEA. shXadd and (this sub-case of) LEA are largely equivalent. This is an alternative to #87105. This expansion is also supported via the decomposeMulByConstant callback, but restricted because of interactions with other combines since that code runs before legalization. As discussed in the other review, my original plan had been to support post legalization expansion through the same interface, but that ended up being more complicated than seems justified. Instead, lets go ahead and do the general expansion post-legalize. Other targets use the combine approach, and matching that structure makes it easier for us to adapt ideas from other targets to RISCV.

preames · 2024-04-16T00:39:13Z

Alternative patch landed.

preames requested review from asb, topperc and pcwang-thead March 29, 2024 19:01

llvmbot added the backend:RISC-V label Mar 29, 2024

preames commented Mar 29, 2024

View reviewed changes

wangpc-pp requested review from wangpc-pp and removed request for pcwang-thead April 1, 2024 03:07

preames mentioned this pull request Apr 12, 2024

[RISCV] Expand mul to shNadd x, (slli x, c) in DAGCombine #88524

Merged

preames closed this Apr 16, 2024

preames mentioned this pull request Apr 17, 2024

[RISCV] Strength reduce mul by 2^N - 2^M #88983

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

preames commented Mar 29, 2024

llvmbot commented Mar 29, 2024

preames commented Mar 29, 2024

preames Mar 29, 2024

preames Mar 29, 2024

efriedma-quic Mar 29, 2024

preames Apr 11, 2024

preames Mar 29, 2024

preames commented Apr 12, 2024

preames commented Apr 16, 2024

[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

Conversation

preames commented Mar 29, 2024

llvmbot commented Mar 29, 2024

preames commented Mar 29, 2024

preames Mar 29, 2024

Choose a reason for hiding this comment

preames Mar 29, 2024

Choose a reason for hiding this comment

efriedma-quic Mar 29, 2024

Choose a reason for hiding this comment

preames Apr 11, 2024

Choose a reason for hiding this comment

preames Mar 29, 2024

Choose a reason for hiding this comment

preames commented Apr 12, 2024

preames commented Apr 16, 2024