[ARM] Fix VBICimm and VORRimm generation under Big endian. #107813

davemgreen · 2024-09-09T06:41:49Z

This is a smaller follow on to #105519 that fixes VBICimm and VORRimm too. The logic behind lowering vector immediates under big endian Neon/MVE is to treat them in natural lane ordering (same as little endian), and VECTOR_REG_CAST them to the correct type (as opposed to creating the constants in big endian form and bitcasting them). This makes sure that is done when creating VORRIMM and VBICIMM.

This is a smaller follow on to llvm#105519 that fixes VBICimm and VORRimm too. The logic behind lowering vector immediates under big endian Neon/MVE is to treat them in natural lane ordering (same as little endian), and VECTOR_REG_CAST them to the correct type (as opposed to creating the constants in big endian form and bitcasting them). This makes sure that is done when creating VORRIMM and VBICIMM.

llvmbot · 2024-09-09T06:42:18Z

@llvm/pr-subscribers-backend-arm

Author: David Green (davemgreen)

Changes

This is a smaller follow on to #105519 that fixes VBICimm and VORRimm too. The logic behind lowering vector immediates under big endian Neon/MVE is to treat them in natural lane ordering (same as little endian), and VECTOR_REG_CAST them to the correct type (as opposed to creating the constants in big endian form and bitcasting them). This makes sure that is done when creating VORRIMM and VBICIMM.

Full diff: https://github.com/llvm/llvm-project/pull/107813.diff

3 Files Affected:

(modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+4-4)
(modified) llvm/test/CodeGen/ARM/big-endian-vmov.ll (+4-6)
(modified) llvm/test/CodeGen/Thumb2/mve-vmovimm.ll (+4-6)

diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 8c82161bd15c65..67653a17512d2c 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -14443,9 +14443,9 @@ static SDValue PerformANDCombine(SDNode *N,
                                       DAG, dl, VbicVT, VT, OtherModImm);
       if (Val.getNode()) {
         SDValue Input =
-          DAG.getNode(ISD::BITCAST, dl, VbicVT, N->getOperand(0));
+            DAG.getNode(ARMISD::VECTOR_REG_CAST, dl, VbicVT, N->getOperand(0));
         SDValue Vbic = DAG.getNode(ARMISD::VBICIMM, dl, VbicVT, Input, Val);
-        return DAG.getNode(ISD::BITCAST, dl, VT, Vbic);
+        return DAG.getNode(ARMISD::VECTOR_REG_CAST, dl, VT, Vbic);
       }
     }
   }
@@ -14739,9 +14739,9 @@ static SDValue PerformORCombine(SDNode *N,
                             SplatBitSize, DAG, dl, VorrVT, VT, OtherModImm);
       if (Val.getNode()) {
         SDValue Input =
-          DAG.getNode(ISD::BITCAST, dl, VorrVT, N->getOperand(0));
+            DAG.getNode(ARMISD::VECTOR_REG_CAST, dl, VorrVT, N->getOperand(0));
         SDValue Vorr = DAG.getNode(ARMISD::VORRIMM, dl, VorrVT, Input, Val);
-        return DAG.getNode(ISD::BITCAST, dl, VT, Vorr);
+        return DAG.getNode(ARMISD::VECTOR_REG_CAST, dl, VT, Vorr);
       }
     }
   }
diff --git a/llvm/test/CodeGen/ARM/big-endian-vmov.ll b/llvm/test/CodeGen/ARM/big-endian-vmov.ll
index 327d4334ad83a1..e3647cf6234c82 100644
--- a/llvm/test/CodeGen/ARM/big-endian-vmov.ll
+++ b/llvm/test/CodeGen/ARM/big-endian-vmov.ll
@@ -219,7 +219,6 @@ define arm_aapcs_vfpcc <8 x i16> @vmvn_v16i8_m1() {
   ret <8 x i16> <i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534>
 }
 
-; FIXME: This is incorrect for BE
 define arm_aapcs_vfpcc <8 x i16> @and_v8i16_m1(<8 x i16> %a) {
 ; CHECK-LE-LABEL: and_v8i16_m1:
 ; CHECK-LE:       @ %bb.0:
@@ -228,15 +227,14 @@ define arm_aapcs_vfpcc <8 x i16> @and_v8i16_m1(<8 x i16> %a) {
 ;
 ; CHECK-BE-LABEL: and_v8i16_m1:
 ; CHECK-BE:       @ %bb.0:
-; CHECK-BE-NEXT:    vrev64.32 q8, q0
+; CHECK-BE-NEXT:    vrev64.16 q8, q0
 ; CHECK-BE-NEXT:    vbic.i32 q8, #0x10000
-; CHECK-BE-NEXT:    vrev64.32 q0, q8
+; CHECK-BE-NEXT:    vrev64.16 q0, q8
 ; CHECK-BE-NEXT:    bx lr
   %b = and <8 x i16> %a, <i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534>
   ret <8 x i16> %b
 }
 
-; FIXME: This is incorrect for BE
 define arm_aapcs_vfpcc <8 x i16> @or_v8i16_1(<8 x i16> %a) {
 ; CHECK-LE-LABEL: or_v8i16_1:
 ; CHECK-LE:       @ %bb.0:
@@ -245,9 +243,9 @@ define arm_aapcs_vfpcc <8 x i16> @or_v8i16_1(<8 x i16> %a) {
 ;
 ; CHECK-BE-LABEL: or_v8i16_1:
 ; CHECK-BE:       @ %bb.0:
-; CHECK-BE-NEXT:    vrev64.32 q8, q0
+; CHECK-BE-NEXT:    vrev64.16 q8, q0
 ; CHECK-BE-NEXT:    vorr.i32 q8, #0x10000
-; CHECK-BE-NEXT:    vrev64.32 q0, q8
+; CHECK-BE-NEXT:    vrev64.16 q0, q8
 ; CHECK-BE-NEXT:    bx lr
   %b = or <8 x i16> %a, <i16 0, i16 1, i16 0, i16 1, i16 0, i16 1, i16 0, i16 1>
   ret <8 x i16> %b
diff --git a/llvm/test/CodeGen/Thumb2/mve-vmovimm.ll b/llvm/test/CodeGen/Thumb2/mve-vmovimm.ll
index 9cf92663e3b052..1970ff35f183fa 100644
--- a/llvm/test/CodeGen/Thumb2/mve-vmovimm.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-vmovimm.ll
@@ -1331,7 +1331,6 @@ entry:
   ret <2 x i64> %s
 }
 
-; FIXME: This is incorrect for BE
 define arm_aapcs_vfpcc <8 x i16> @and_v8i16_m1(<8 x i16> %a) {
 ; CHECKLE-LABEL: and_v8i16_m1:
 ; CHECKLE:       @ %bb.0:
@@ -1340,15 +1339,14 @@ define arm_aapcs_vfpcc <8 x i16> @and_v8i16_m1(<8 x i16> %a) {
 ;
 ; CHECKBE-LABEL: and_v8i16_m1:
 ; CHECKBE:       @ %bb.0:
-; CHECKBE-NEXT:    vrev64.32 q1, q0
+; CHECKBE-NEXT:    vrev64.16 q1, q0
 ; CHECKBE-NEXT:    vbic.i32 q1, #0x10000
-; CHECKBE-NEXT:    vrev64.32 q0, q1
+; CHECKBE-NEXT:    vrev64.16 q0, q1
 ; CHECKBE-NEXT:    bx lr
   %b = and <8 x i16> %a, <i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534, i16 65535, i16 65534>
   ret <8 x i16> %b
 }
 
-; FIXME: This is incorrect for BE
 define arm_aapcs_vfpcc <8 x i16> @or_v8i16_1(<8 x i16> %a) {
 ; CHECKLE-LABEL: or_v8i16_1:
 ; CHECKLE:       @ %bb.0:
@@ -1357,9 +1355,9 @@ define arm_aapcs_vfpcc <8 x i16> @or_v8i16_1(<8 x i16> %a) {
 ;
 ; CHECKBE-LABEL: or_v8i16_1:
 ; CHECKBE:       @ %bb.0:
-; CHECKBE-NEXT:    vrev64.32 q1, q0
+; CHECKBE-NEXT:    vrev64.16 q1, q0
 ; CHECKBE-NEXT:    vorr.i32 q1, #0x10000
-; CHECKBE-NEXT:    vrev64.32 q0, q1
+; CHECKBE-NEXT:    vrev64.16 q0, q1
 ; CHECKBE-NEXT:    bx lr
   %b = or <8 x i16> %a, <i16 0, i16 1, i16 0, i16 1, i16 0, i16 1, i16 0, i16 1>
   ret <8 x i16> %b

davemgreen · 2024-09-09T06:42:51Z

@Zhenhang1213 FYI. I couldn't add you as a reviewer, but let me know if you have any comments.

Zhenhang1213 · 2024-09-09T07:56:26Z

@Zhenhang1213 FYI. I couldn't add you as a reviewer, but let me know if you have any comments.

this case I think right, and I have a question，Is there other similarities that needs to be modified? like:

llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp

Lines 7218 to 7227 in d5f6f30

    
           SDValue NewVal = isVMOVModifiedImm(iVal & 0xffffffffU, 0, 32, DAG, SDLoc(Op), 
        
                                              VMovVT, VT, VMOVModImm); 
        
           if (NewVal != SDValue()) { 
        
             SDLoc DL(Op); 
        
             SDValue VecConstant = DAG.getNode(ARMISD::VMOVIMM, DL, VMovVT, 
        
                                               NewVal); 
        
             if (IsDouble) 
        
               return DAG.getNode(ISD::BITCAST, DL, MVT::f64, VecConstant); 
        
             // It's a float: cast and extract a vector element.

davemgreen · 2024-09-09T16:12:11Z

You might well be right if it is casting i32->f64. I hadn't looked at that bit of code, just the code that calls isConstantSplat. Are you able to create a double value that produces incorrect results?

Zhenhang1213 · 2024-09-11T01:19:26Z

You might well be right if it is casting i32->f64. I hadn't looked at that bit of code, just the code that calls isConstantSplat. Are you able to create a double value that produces incorrect results?

OK， I will try to create it then，if I make any progress，I'll tell you. This case is right

davemgreen · 2024-09-12T15:14:34Z

Thanks. Just to check - do you mean that this patch now looks correct, or the original code was correct?

Zhenhang1213 · 2024-09-13T01:49:07Z

Thanks. Just to check - do you mean that this patch now looks correct, or the original code was correct?

OK, I think this patch correct.

davemgreen · 2024-09-13T09:59:54Z

Thanks. I'll count that as a review as the code changes here are relatively minor.

davemgreen requested review from ostannard, john-brawn-arm, CoTinker and hstk30-hw September 9, 2024 06:41

llvmbot added the backend:ARM label Sep 9, 2024

davemgreen merged commit 637aa61 into llvm:main Sep 13, 2024
10 checks passed

davemgreen deleted the gh-be-vbicvorr branch September 13, 2024 10:00

CoTinker approved these changes Sep 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM] Fix VBICimm and VORRimm generation under Big endian. #107813

[ARM] Fix VBICimm and VORRimm generation under Big endian. #107813

davemgreen commented Sep 9, 2024

llvmbot commented Sep 9, 2024

davemgreen commented Sep 9, 2024

Zhenhang1213 commented Sep 9, 2024

davemgreen commented Sep 9, 2024

Zhenhang1213 commented Sep 11, 2024

davemgreen commented Sep 12, 2024

Zhenhang1213 commented Sep 13, 2024

davemgreen commented Sep 13, 2024

[ARM] Fix VBICimm and VORRimm generation under Big endian. #107813

[ARM] Fix VBICimm and VORRimm generation under Big endian. #107813

Conversation

davemgreen commented Sep 9, 2024

llvmbot commented Sep 9, 2024

davemgreen commented Sep 9, 2024

Zhenhang1213 commented Sep 9, 2024

davemgreen commented Sep 9, 2024

Zhenhang1213 commented Sep 11, 2024

davemgreen commented Sep 12, 2024

Zhenhang1213 commented Sep 13, 2024

davemgreen commented Sep 13, 2024