[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418

Zhenhang1213 · 2024-08-08T03:58:20Z

I find results in armebv7-linux are different from armv7.
code

#include <stdio.h> 
volatile int one = 1;

int main (int argc, char *argv[]) {
    __attribute__((vector_size((8)*sizeof(short)))) short v0 = {one, 1, 2, 3, 4, 5, 6, 7};
    __attribute__((vector_size((8)*sizeof(short)))) short v1;
    v1 = v0 % 2; 
    for (int __i = 0; __i < 8; __i++) { 
        printf("v1: %d\n", (*((short *) &(v1) + __i))); 
    }
    return 0;
}

When __i is 4, the result is different. I think this Instruction: vrev32.16 d17, d18 is wrong, and I try to modify it is vmov.16 d17, d18, the result is right. However I don't know how to fix this bug.

https://godbolt.org/z/EqYebY73z

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-08-08T04:27:28Z

@llvm/issue-subscribers-backend-arm

Author: Austin (Zhenhang1213)

I find results in armebv7-linux are different from armv7. code ''' #include <stdio.h> volatile int one = 1;

int main (int argc, char *argv[]) {
attribute((vector_size((8)*sizeof(short)))) short v0 = {one, 1, 2, 3, 4, 5, 6, 7};
attribute((vector_size((8)sizeof(short)))) short v1;
v1 = v0 % 2;
for (int __i = 0; __i < 8; __i++) {
printf("v1: %d\n", (((short *) &(v1) + __i)));
}
return 0;
}
'''

When __i is 4, the result is different. I think this Instruction: vrev32.16 d17, d18 is wrong, and I try to modify it is vmov.16 d17, d18, the result is right. However I don't know how to fix this bug.

https://godbolt.org/z/EqYebY73z

davemgreen · 2024-08-08T05:25:37Z

#97782 recently fixed a similar sounding issue, but it doesn't look like the code has changed on trunk https://godbolt.org/z/7WYn38xTs.

Zhenhang1213 · 2024-08-08T06:33:27Z

#97782 recently fixed a similar sounding issue, but it doesn't look like the code has changed on trunk https://godbolt.org/z/7WYn38xTs.

yes, I compile clang with this patch, and the code is same

hstk30-hw · 2024-08-08T07:13:07Z

Same problem, but different situation.
The vrev32.16 d17, d18 is unnecessary,

vmov.i32        d18, #0x10000
vrev32.16 d17, d18

repalce to

vmov.i32        d17, #0x10000

is ok.

Zhenhang1213 · 2024-08-09T09:18:30Z

Same problem, but different situation. The vrev32.16 d17, d18 is unnecessary,
vmov.i32        d18, #0x10000
vrev32.16 d17, d18
repalce to
vmov.i32        d17, #0x10000
is ok.

@davemgreen could you give me some ideas? I've been puzzled by this question for a long time.

davemgreen · 2024-08-09T11:35:39Z

Hi it honestly looks like vector constants are generated incorrectly if they can be specified with a larger movi size (i.e. using a vmov.i32 for a i16 constant splat).
https://godbolt.org/z/oPczbxbvq

That may be related to #68212, we perhaps should be passing IsBigEndian to isConstantSplat in LowerBUILD_VECTOR. The bitcast of the VMOVIMM also looks suspicious, and there might be a problem in PerformBITCASTCombine where it is ignoring casts that it should not. There might be cases where we get it wrong in two places that usually cancel each other out.

Zhenhang1213 · 2024-08-10T03:04:50Z

Hi it honestly looks like vector constants are generated incorrectly if they can be specified with a larger movi size (i.e. using a vmov.i32 for a i16 constant splat). https://godbolt.org/z/oPczbxbvq

That may be related to #68212, we perhaps should be passing IsBigEndian to isConstantSplat in LowerBUILD_VECTOR. The bitcast of the VMOVIMM also looks suspicious, and there might be a problem in PerformBITCASTCombine where it is ignoring casts that it should not. There might be cases where we get it wrong in two places that usually cancel each other out.

However，I find llvm 15 also is wrong without this patch

Zhenhang1213 · 2024-08-15T13:23:13Z

Hi it honestly looks like vector constants are generated incorrectly if they can be specified with a larger movi size (i.e. using a vmov.i32 for a i16 constant splat). https://godbolt.org/z/oPczbxbvq

That may be related to #68212, we perhaps should be passing IsBigEndian to isConstantSplat in LowerBUILD_VECTOR. The bitcast of the VMOVIMM also looks suspicious, and there might be a problem in PerformBITCASTCombine where it is ignoring casts that it should not. There might be cases where we get it wrong in two places that usually cancel each other out.

llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp

Lines 7966 to 7969 in 6f6422f

    
           if (Val.getNode()) { 
        
             SDValue Vmov = DAG.getNode(ARMISD::VMOVIMM, dl, VmovVT, Val); 
        
             return DAG.getNode(ISD::BITCAST, dl, VT, Vmov); 
        
           }

Is this code for endianness adjustment? In this scenario, should I adjust the lower part and adjust byte order finally after CONCAT_VECTORS?

davemgreen · 2024-08-15T13:52:01Z

I don't believe that #68212 broke or fixed anything on it's own as we don't set the new parameter. We might need to add the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering. Then change the BITCAST you point to above into a VECTOR_REG_CAST and then see what else changes and if there are any other follow-on fixes that are needed. From what I remember from looking into it the PerformBITCASTCombine function might need to be tightened-up when there is BITCAST(VECTOR_REG_CAST). The important thing is to test it to make sure we are now getting the right results.

Does that sound do-able? Big-endian is unfortunately much less used than the rest so doesn't get as much testing.

Zhenhang1213 · 2024-08-16T01:20:06Z

I don't believe that #68212 broke or fixed anything on it's own as we don't set the new parameter. We might need to add the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering. Then change the BITCAST you point to above into a VECTOR_REG_CAST and then see what else changes and if there are any other follow-on fixes that are needed. From what I remember from looking into it the PerformBITCASTCombine function might need to be tightened-up when there is BITCAST(VECTOR_REG_CAST). The important thing is to test it to make sure we are now getting the right results.

Does that sound do-able? Big-endian is unfortunately much less used than the rest so doesn't get as much testing.

yes，I try to mody it and some tests failed， I think there are some problems with the modified behavior, such as this case

; RUN: llc -mtriple armeb-eabi -mattr=armv8.2-a,neon,fullfp16 -target-abi=aapcs-gnu -float-abi hard -o - %s | FileCheck %s
define void @conv_v4i16_to_v4f16( <4 x i16> %a, <4 x half>* %store ) {
; CHECK-LABEL: conv_v4i16_to_v4f16:
; CHECK:       @ %bb.0: @ %entry
; CHECK-NEXT:    vmov.i64 d16, #0xffff00000000ffff
; CHECK-NEXT:    vldr d17, [r0]
; CHECK-NEXT:    vrev64.16 d18, d0
; CHECK-NEXT:    vrev64.16 d17, d17
; CHECK-NEXT:    vrev64.16 d16, d16
; CHECK-NEXT:    vadd.i16 d16, d18, d16
; CHECK-NEXT:    vadd.f16 d16, d16, d17
; CHECK-NEXT:    vrev64.16 d16, d16
; CHECK-NEXT:    vstr d16, [r0]
; CHECK-NEXT:    bx lr
entry:
  %c = add <4 x i16> %a, <i16 -1, i16 0, i16 0, i16 -1>
  %v = bitcast <4 x i16> %c to <4 x half>
  %w = load <4 x half>, <4 x half>* %store
  %z = fadd <4 x half> %v, %w
  store <4 x half> %z, <4 x half>* %store
  ret void
}

and the result is

 vmov.i64        d16, #0xffff00000000ffff
        vldr    d17, [r0]
        vrev64.16       d18, d0
        vadd.i16        d16, d18, d16
        vrev64.16       d17, d17
        vadd.f16        d16, d16, d17
        vrev64.16       d16, d16
        vstr    d16, [r0]
        bx      lr

The logic of the two parts is not the same. When I debug PerformVECTOR_REG_CASTCombine function， Op->getOpcode() isnot ARMISD::VECTOR_REG_CAST, so return SDValue();

Zhenhang1213 · 2024-08-20T07:09:08Z

I don't believe that #68212 broke or fixed anything on it's own as we don't set the new parameter. We might need to add the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering. Then change the BITCAST you point to above into a VECTOR_REG_CAST and then see what else changes and if there are any other follow-on fixes that are needed. From what I remember from looking into it the PerformBITCASTCombine function might need to be tightened-up when there is BITCAST(VECTOR_REG_CAST). The important thing is to test it to make sure we are now getting the right results.

Does that sound do-able? Big-endian is unfortunately much less used than the rest so doesn't get as much testing.

After adding the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering, cancel the adaptation for 64-bit VMOV of BigEndian in isVMOVModifiedImm

entry:
  %0 = bitcast <4 x i32> %src to <16 x i8>
  %r = and <16 x i8> %0, <i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 0, i8 1, i8 0, i8 0, i8 0, i8 1>
  ret <16 x i8> %r

and the instruction generated I think is incorrect，


vrev64.8        q1, q0
vmov.i32        q0, #0x1
vrev32.8        q0, q0
vand            q1, q1, q0
vrev64.8        q0, q1
bx              lr

could you give me more advice? Modifying BITCAST to VECTOR_REG_CAST) brings more failures, thx.

davemgreen · 2024-08-20T21:17:47Z

I was taking another look and using the IsBigEndian parameter to isConstantSplat might have been a mistake to suggest. I think it would make more sense to always have the constants produced in the "vector-lane-order" (same as little-endian, lowest lane at the bottom), which means not passing isBigEndian through to isConstantSplat.

The constants we create then don't need to be bitcast to change the lane order, they just need to be vector_reg_cast. Apparently i64 are for some reason special and the vector elements get reversed in

llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp

Line 7124 in dd90c72

if (DAG.getDataLayout().isBigEndian()) {

, that should be removed so we are consistent about which order lanes appear.

Could you put a patch together with those changes? We can go through the test differences to makes sure they all seem correct. I'm pretty sure that something will need to change in PerformBITCASTCombine in the optimization of bitcast(vector_reg_cast(vmovimm)).

Zhenhang1213 · 2024-08-21T02:05:29Z

I was taking another look and using the IsBigEndian parameter to isConstantSplat might have been a mistake to suggest. I think it would make more sense to always have the constants produced in the "vector-lane-order" (same as little-endian, lowest lane at the bottom), which means not passing isBigEndian through to isConstantSplat.

The constants we create then don't need to be bitcast to change the lane order, they just need to be vector_reg_cast. Apparently i64 are for some reason special and the vector elements get reversed in

llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp

Line 7124 in dd90c72

if (DAG.getDataLayout().isBigEndian()) {

, that should be removed so we are consistent about which order lanes appear.
Could you put a patch together with those changes? We can go through the test differences to makes sure they all seem correct. I'm pretty sure that something will need to change in PerformBITCASTCombine in the optimization of bitcast(vector_reg_cast(vmovimm)).

I have tried these modifications.

diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 4c24d7020932..d864ba00a02b 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -7011,19 +7011,6 @@ static SDValue isVMOVModifiedImm(uint64_t SplatBits, uint64_t SplatUndef,
       ImmMask <<= 1;
     }

-    if (DAG.getDataLayout().isBigEndian()) {
-      // Reverse the order of elements within the vector.
-      unsigned BytesPerElem = VectorVT.getScalarSizeInBits() / 8;
-      unsigned Mask = (1 << BytesPerElem) - 1;
-      unsigned NumElems = 8 / BytesPerElem;
-      unsigned NewImm = 0;
-      for (unsigned ElemNum = 0; ElemNum < NumElems; ++ElemNum) {
-        unsigned Elem = ((Imm >> ElemNum * BytesPerElem) & Mask);
-        NewImm |= Elem << (NumElems - ElemNum - 1) * BytesPerElem;
-      }
-      Imm = NewImm;
-    }
-
     // Op=1, Cmode=1110.
     OpCmode = 0x1e;
     VT = is128Bits ? MVT::v2i64 : MVT::v1i64;
@@ -7831,7 +7818,7 @@ SDValue ARMTargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,

       if (Val.getNode()) {
         SDValue Vmov = DAG.getNode(ARMISD::VMOVIMM, dl, VmovVT, Val);
-        return DAG.getNode(ISD::BITCAST, dl, VT, Vmov);
+        return DAG.getNode(ARMISD::VECTOR_REG_CAST, dl, VT, Vmov);
       }

       // Try an immediate VMVN.
@@ -18292,7 +18279,6 @@ static SDValue PerformBITCASTCombine(SDNode *N,
   if ((Src.getOpcode() == ARMISD::VMOVIMM ||
        Src.getOpcode() == ARMISD::VMVNIMM ||
        Src.getOpcode() == ARMISD::VMOVFPIMM) &&
-      SrcVT.getScalarSizeInBits() <= DstVT.getScalarSizeInBits() &&
       DAG.getDataLayout().isBigEndian())
     return DAG.getNode(ARMISD::VECTOR_REG_CAST, SDLoc(N), DstVT, Src);

I test big-endian-neon-fp16-bitconv.ll failed, I think PerformVECTOR_REG_CASTCombine should be changed next? right?

Zhenhang1213 · 2024-08-21T06:30:01Z

I was taking another look and using the IsBigEndian parameter to isConstantSplat might have been a mistake to suggest. I think it would make more sense to always have the constants produced in the "vector-lane-order" (same as little-endian, lowest lane at the bottom), which means not passing isBigEndian through to isConstantSplat.

The constants we create then don't need to be bitcast to change the lane order, they just need to be vector_reg_cast. Apparently i64 are for some reason special and the vector elements get reversed in

llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp

Line 7124 in dd90c72

if (DAG.getDataLayout().isBigEndian()) {

, that should be removed so we are consistent about which order lanes appear.
Could you put a patch together with those changes? We can go through the test differences to makes sure they all seem correct. I'm pretty sure that something will need to change in PerformBITCASTCombine in the optimization of bitcast(vector_reg_cast(vmovimm)).

After this patch, after vmov, all vrev will be cancelled, so those tests will failed, like this

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple armeb-eabi -mattr=armv8.2-a,neon,fullfp16 -target-abi=aapcs-gnu -float-abi hard -o - %s | FileCheck %s
define void @conv_v4i16_to_v4f16( <4 x i16> %a, <4 x half>* %store ) {
; CHECK-LABEL: conv_v4i16_to_v4f16:
; CHECK:       @ %bb.0: @ %entry
; CHECK-NEXT:    vmov.i64 d16, #0xffff00000000ffff
; CHECK-NEXT:    vldr d17, [r0]
; CHECK-NEXT:    vrev64.16 d18, d0
; CHECK-NEXT:    vrev64.16 d17, d17
; CHECK-NEXT:    vrev64.16 d16, d16
; CHECK-NEXT:    vadd.i16 d16, d18, d16
; CHECK-NEXT:    vadd.f16 d16, d16, d17
; CHECK-NEXT:    vrev64.16 d16, d16
; CHECK-NEXT:    vstr d16, [r0]
; CHECK-NEXT:    bx lr
entry:
  %c = add <4 x i16> %a, <i16 -1, i16 0, i16 0, i16 -1>
  %v = bitcast <4 x i16> %c to <4 x half>
  %w = load <4 x half>, <4 x half>* %store
  %z = fadd <4 x half> %v, %w
  store <4 x half> %z, <4 x half>* %store
  ret void
}

the result is:

vmov.i64        d16, #0xffff00000000ffff
vldr    d17, [r0]
vrev64.16       d18, d0
vadd.i16        d16, d18, d16
vrev64.16       d17, d17
vadd.f16        d16, d16, d17
vrev64.16       d16, d16
vstr    d16, [r0]
bx      lr

davemgreen · 2024-08-21T06:41:27Z

Is that incorrect now, or was that vrev64.16 unnecessary?

The PerformBITCASTCombine part probably needs to be more restrictive, not less. If there is:

%a = v2f64 vmovimm
%b = v16i8 vector_reg_cast %a
%c = v2i64 bitcast %b

We need to keep the v16i8->v2i64 bitcast to keep the lane shuffling, even if the original type elt size is <= the final type elt size.

Zhenhang1213 · 2024-08-21T09:23:42Z

Is that incorrect now, or was that vrev64.16 unnecessary?

The PerformBITCASTCombine part probably needs to be more restrictive, not less. If there is:
%a = v2f64 vmovimm
%b = v16i8 vector_reg_cast %a
%c = v2i64 bitcast %b
We need to keep the v16i8->v2i64 bitcast to keep the lane shuffling, even if the original type elt size is <= the final type elt size.

Compared to the original case，vrev64.16 is unnecessary, in this case, the result is same
https://godbolt.org/z/zYv866sMY

Fix #102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios

Fix llvm#102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios

github-actions bot added the new issue label Aug 8, 2024

EugeneZelenko added backend:ARM and removed new issue labels Aug 8, 2024

Zhenhang1213 mentioned this issue Aug 21, 2024

[ARM][Codegen] Fix vector data miscompilation in arm32be #105519

Merged

CoTinker closed this as completed in #105519 Sep 7, 2024

CoTinker pushed a commit that referenced this issue Sep 7, 2024

[ARM][Codegen] Fix vector data miscompilation in arm32be (#105519)

3242e77

Fix #102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios

VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this issue Sep 12, 2024

[ARM][Codegen] Fix vector data miscompilation in arm32be (llvm#105519)

5d01fb8

Fix llvm#102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418

[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418

Zhenhang1213 commented Aug 8, 2024 •

edited by davemgreen

Loading

llvmbot commented Aug 8, 2024

davemgreen commented Aug 8, 2024

Zhenhang1213 commented Aug 8, 2024

hstk30-hw commented Aug 8, 2024

Zhenhang1213 commented Aug 9, 2024

davemgreen commented Aug 9, 2024

Zhenhang1213 commented Aug 10, 2024 •

edited

Loading

Zhenhang1213 commented Aug 15, 2024 •

edited

Loading

davemgreen commented Aug 15, 2024 •

edited

Loading

Zhenhang1213 commented Aug 16, 2024 •

edited

Loading

Zhenhang1213 commented Aug 20, 2024

davemgreen commented Aug 20, 2024

Zhenhang1213 commented Aug 21, 2024 •

edited

Loading

Zhenhang1213 commented Aug 21, 2024 •

edited

Loading

davemgreen commented Aug 21, 2024

Zhenhang1213 commented Aug 21, 2024

[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418

[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418

Comments

Zhenhang1213 commented Aug 8, 2024 • edited by davemgreen Loading

llvmbot commented Aug 8, 2024

davemgreen commented Aug 8, 2024

Zhenhang1213 commented Aug 8, 2024

hstk30-hw commented Aug 8, 2024

Zhenhang1213 commented Aug 9, 2024

davemgreen commented Aug 9, 2024

Zhenhang1213 commented Aug 10, 2024 • edited Loading

Zhenhang1213 commented Aug 15, 2024 • edited Loading

davemgreen commented Aug 15, 2024 • edited Loading

Zhenhang1213 commented Aug 16, 2024 • edited Loading

Zhenhang1213 commented Aug 20, 2024

davemgreen commented Aug 20, 2024

Zhenhang1213 commented Aug 21, 2024 • edited Loading

Zhenhang1213 commented Aug 21, 2024 • edited Loading

davemgreen commented Aug 21, 2024

Zhenhang1213 commented Aug 21, 2024

Zhenhang1213 commented Aug 8, 2024 •

edited by davemgreen

Loading

Zhenhang1213 commented Aug 10, 2024 •

edited

Loading

Zhenhang1213 commented Aug 15, 2024 •

edited

Loading

davemgreen commented Aug 15, 2024 •

edited

Loading

Zhenhang1213 commented Aug 16, 2024 •

edited

Loading

Zhenhang1213 commented Aug 21, 2024 •

edited

Loading

Zhenhang1213 commented Aug 21, 2024 •

edited

Loading