-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal ASM emmited for Vector256<T>.Zero and Vector128<T>.Zero #76067
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsWhen using Vector256.Zero, I would expect it to be kept in a fixed register and reused. Instead, what I see is a AVX2 has 16 YMM registers. Bellow is one example. I can get the desired behavior by forcing a zero vector variable instead of using Vector256.Zero;
|
This is a known issue that we might eventually address with constant materialization during LSRA. |
Thanks for the info @EgorBo |
Discussed with @tannergooding for a bit regarding this, as an immediate thing we could do, is a peephole optimization to eliminate unnecessary |
Another manifistation of this problem: void Foo(ref byte b)
{
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref b, 0)) = default;
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref b, 16)) = default;
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref b, 32)) = default;
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref b, 48)) = default;
} Emits: movi v16.4s, #0
str q16, [x1]
movi v16.4s, #0
str q16, [x1, #0x10]
movi v16.4s, #0
str q16, [x1, #0x20]
movi v16.4s, #0
str q16, [x1, #0x30] Expected: movi v16.4s, #0
str q16, q16, [x1]
str q16, q16, [x1, #0x20]
TLDR: that didn't work due to ABI issues, although, it won't be a real fix anyway because it wouldn't help with loop hoisting. |
When using Vector256.Zero, I would expect it to be kept in a fixed register and reused. Instead, what I see is a
vxorps
operation emitted every time.AVX2 has 16 YMM registers.
Bellow is one example. I can get the desired behavior by forcing a zero vector variable instead of using Vector256.Zero;
Assigning
Vector256<byte>.Zero
to a variable alone does not do the trick. Only the extra xor operation ensures it stays in a fixed register.category:cq
theme:cse
skill-level:intermediate
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: