-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
When using NET 9-RC2, Vector<T>
constructor that broadcasts a scalar to all elements of a vector is not optimized to a broadcasting instruction on x86/x64. .NET 8 compiler makes this optimization.
Reproduction Steps
The regression can be reproduced by compiling the following function:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ScalarToVector(int scalar) => new(scalar);
Expected behavior
I would expect the compiler to use only few instructions for broadcasting the scalar to all elements. This what .NET 8 compiler produces:
vzeroupper
vpbroadcastd ymm0, esi
vmovups ymmword ptr [rdi], ymm0
mov rax, rdi
vzeroupper
ret
So a single vpbroadcastd does the job when AVX2 is enabled.
Actual behavior
Using .NET 9-rc2, the following machine code is generated:
push rbp
sub rsp, 48
lea rbp, [rsp+0x30]
vxorps ymm0, ymm0, ymm0
vmovups ymmword ptr [rbp-0x30], ymm0
mov dword ptr [rbp-0x30], esi
mov dword ptr [rbp-0x2C], esi
mov dword ptr [rbp-0x28], esi
mov dword ptr [rbp-0x24], esi
mov dword ptr [rbp-0x20], esi
mov dword ptr [rbp-0x1C], esi
mov dword ptr [rbp-0x18], esi
mov dword ptr [rbp-0x14], esi
vmovups ymm0, ymmword ptr [rbp-0x30]
vmovups ymmword ptr [rdi], ymm0
mov rax, rdi
vzeroupper
add rsp, 48
pop rbp
ret
As you can see, the compiler fills elements individually to an array on stack, which is much slower.
Regression?
No response
Known Workarounds
Use .NET 8 or select Vector128/256/512.Create method based on Vector<T>
length:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ScalarToVector(int scalar)
{
if (Vector<int>.Count == 16)
{
return Vector512.Create(scalar).AsVector();
}
else if (Vector<int>.Count == 8)
{
return Vector256.Create(scalar).AsVector();
}
else if (Vector<int>.Count == 4)
{
return Vector128.Create(scalar).AsVector();
}
else
{
return new(scalar);
}
}
This workaround results in the following machine code with .NET 9-RC2:
vpbroadcastd ymm0, esi
vmovups ymmword ptr [rdi], ymm0
mov rax, rdi
vzeroupper
ret
Configuration
No response
Other information
No response