-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm] Deal vector data failed when using option Os、-mfloat-abi=softfp、--target=armebv7-linux #102418
Comments
@llvm/issue-subscribers-backend-arm Author: Austin (Zhenhang1213)
I find results in armebv7-linux are different from armv7.
code
'''
#include <stdio.h>
volatile int one = 1;
int main (int argc, char *argv[]) { When __i is 4, the result is different. I think this Instruction: vrev32.16 d17, d18 is wrong, and I try to modify it is vmov.16 d17, d18, the result is right. However I don't know how to fix this bug. |
#97782 recently fixed a similar sounding issue, but it doesn't look like the code has changed on trunk https://godbolt.org/z/7WYn38xTs. |
yes, I compile clang with this patch, and the code is same |
Same problem, but different situation.
repalce to
is ok. |
@davemgreen could you give me some ideas? I've been puzzled by this question for a long time. |
Hi it honestly looks like vector constants are generated incorrectly if they can be specified with a larger movi size (i.e. using a vmov.i32 for a i16 constant splat). That may be related to #68212, we perhaps should be passing IsBigEndian to isConstantSplat in LowerBUILD_VECTOR. The bitcast of the VMOVIMM also looks suspicious, and there might be a problem in PerformBITCASTCombine where it is ignoring casts that it should not. There might be cases where we get it wrong in two places that usually cancel each other out. |
However,I find llvm 15 also is wrong without this patch |
llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp Lines 7966 to 7969 in 6f6422f
Is this code for endianness adjustment? In this scenario, should I adjust the lower part and adjust byte order finally after CONCAT_VECTORS? |
I don't believe that #68212 broke or fixed anything on it's own as we don't set the new parameter. We might need to add the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering. Then change the BITCAST you point to above into a VECTOR_REG_CAST and then see what else changes and if there are any other follow-on fixes that are needed. From what I remember from looking into it the PerformBITCASTCombine function might need to be tightened-up when there is BITCAST(VECTOR_REG_CAST). The important thing is to test it to make sure we are now getting the right results. Does that sound do-able? Big-endian is unfortunately much less used than the rest so doesn't get as much testing. |
yes,I try to mody it and some tests failed, I think there are some problems with the modified behavior, such as this case
and the result is
The logic of the two parts is not the same. When I debug PerformVECTOR_REG_CASTCombine function, Op->getOpcode() isnot ARMISD::VECTOR_REG_CAST, so return SDValue(); |
After adding the IsBigEndian parameter with the correct value in the call to isConstantSplat in ARMISelLowering, cancel the adaptation for 64-bit VMOV of BigEndian in isVMOVModifiedImm
and the instruction generated I think is incorrect,
could you give me more advice? Modifying BITCAST to VECTOR_REG_CAST) brings more failures, thx. |
I was taking another look and using the IsBigEndian parameter to isConstantSplat might have been a mistake to suggest. I think it would make more sense to always have the constants produced in the "vector-lane-order" (same as little-endian, lowest lane at the bottom), which means not passing isBigEndian through to isConstantSplat. The constants we create then don't need to be bitcast to change the lane order, they just need to be vector_reg_cast. Apparently i64 are for some reason special and the vector elements get reversed in
Could you put a patch together with those changes? We can go through the test differences to makes sure they all seem correct. I'm pretty sure that something will need to change in PerformBITCASTCombine in the optimization of bitcast(vector_reg_cast(vmovimm)). |
I have tried these modifications.
I test big-endian-neon-fp16-bitconv.ll failed, I think PerformVECTOR_REG_CASTCombine should be changed next? right? |
After this patch, after vmov, all vrev will be cancelled, so those tests will failed, like this
the result is:
|
Is that incorrect now, or was that vrev64.16 unnecessary? The PerformBITCASTCombine part probably needs to be more restrictive, not less. If there is:
We need to keep the v16i8->v2i64 bitcast to keep the lane shuffling, even if the original type elt size is <= the final type elt size. |
Compared to the original case,vrev64.16 is unnecessary, in this case, the result is same |
Fix #102418, resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios
Fix llvm#102418, resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios
I find results in armebv7-linux are different from armv7.
code
When __i is 4, the result is different. I think this Instruction: vrev32.16 d17, d18 is wrong, and I try to modify it is vmov.16 d17, d18, the result is right. However I don't know how to fix this bug.
https://godbolt.org/z/EqYebY73z
The text was updated successfully, but these errors were encountered: