-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsAs per the title this improves the software fallback used for It additionally does some cleanup of functions in Half, Single, and Double to ensure they remain performant and consistent with eachother where possible.
|
I still see references to Math.Round being an intrinsic. Is that only in support of cases where the argument is known constant, or are there other cases where this managed implementation isn't used? |
ARM64 implements it using a HW instruction. |
And X86 does too for all but AwayFromZero. |
Most of our platforms use an intrinsic where possible. The exact mapping of that is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L6840 For Arm64 we can always accelerate while for x86/x64, we can only trivially accelerate on The software fallback is still used for indirect invocation, for certain cases involving tail calls that prevent us from optimizing it as an intrinsic, and on other platforms outside what RyuJIT supports. |
Diff results for #98040Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Throughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.01% to +0.00%)
Details here |
Diff results for #98040Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64MinOpts (-0.01% to +0.00%)
Throughput diffs for windows/arm64 ran on linux/x64MinOpts (-0.01% to +0.00%)
Details here |
// This is probably not worth inlining, it has branches and should be rarely called | ||
public static unsafe bool IsSubnormal(double d) | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
internal static bool IsZero(double d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this helper if all it's doing is == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows the code to be consistent with Half
(where == 0
is less efficient today), with Generic Math (where a user really needs to use the exposed IsZero
API), and is something we should probably expose publicly anyways since it's part of the IEEE 754 Required Operations (under 5.7.2 General Operations).
bits += 1; | ||
} | ||
else | ||
{ | ||
bits -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Curious for the reason for prefering this over ++ and -- ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my preferred style. I personally don't like x++
or x--
as a standalone expression (in loops its fine), as I find it makes the code much harder to read/reason about. The x += 1
and x -= 1
instead makes it very clear what's going on and that an assignment/mutation is happening here.
int bits = BitConverter.SingleToInt32Bits(f); | ||
return (bits & 0x7FFFFFFF) < 0x7F800000; | ||
uint bits = BitConverter.SingleToUInt32Bits(f); | ||
return (~bits & PositiveInfinityBits) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this more efficient, or it's just clearer what it's doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit more efficient (both perf and space wise) for 64-bit
, while allowing smaller code in hoistable loops for 32-bit. It also allows branching code to be a bit more efficient (instead of and; cmp; jcc
, we just do andn; jcc
).
Basically the old pattern was:
; 32-bit
vmovd eax, xmm0 ; 4-bytes, 3-cycles
and eax, 0x7FFF_FFFF ; 5-bytes, 1-cycles
cmp eax, 0x7F80_0000 ; 5-bytes, 1-cycles
setl al ; 3-bytes, 1-cycles
movzx eax, al ; 3-bytes, 1-cycles
; 20-bytes, 7-cycles
; 64-bit
vmovd rax, xmm0 ; 5-bytes, 3-cycles
mov rcx, 0x7FFF_FFFF_FFFF_FFFF ; 10-bytes, 1-cycles
and rax, rcx ; 3-bytes, 1-cycles
mov rcx, 0x7FF0_0000_0000_0000 ; 10-bytes, 1-cycles
cmp rax, rcx ; 3-bytes, 1-cycles
setl al ; 3-bytes, 1-cycles
movzx eax, al ; 3-bytes, 1-cycles
; 37-bytes, 9-cycles
While the new pattern can be much smaller:
; 32-bit
vmovd eax, xmm0 ; 4-bytes, 3-cycles
mov ecx, 0x7F80_0000 ; 5-bytes, 1-cycles
andn eax, eax, ecx ; 5-bytes, 1-cycles
setne al ; 3-bytes, 1-cycles
movzx rax, al ; 3-bytes, 1-cycles
; 20-bytes, 7-cycles
; 64-bit
vmovd rax, xmm0 ; 5-bytes, 3-cycles
mov rcx, 0x7FF0_0000_0000_0000 ; 10-bytes, 1-cycles
andn rax, rax, rcx ; 5-bytes, 1-cycles
setne al ; 3-bytes, 1-cycles
movzx rax, al ; 3-bytes, 1-cycles
; 26-bytes, 7-cycles
@dotnet-policy-service rerun |
@dotnet-policy-bot rerun |
As per the title this improves the software fallback used for
Math.Round
andMath.ILogB
to be significantly simpler/faster.It additionally does some cleanup of functions in Half, Single, and Double to ensure they remain performant and consistent with eachother where possible.