Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040

tannergooding · 2024-02-06T16:48:57Z

As per the title this improves the software fallback used for Math.Round and Math.ILogB to be significantly simpler/faster.

It additionally does some cleanup of functions in Half, Single, and Double to ensure they remain performant and consistent with eachother where possible.

ghost · 2024-02-06T16:49:09Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

As per the title this improves the software fallback used for Math.Round and Math.ILogB to be significantly simpler/faster.

It additionally does some cleanup of functions in Half, Single, and Double to ensure they remain performant and consistent with eachother where possible.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-System.Numerics`
Milestone:	-

stephentoub · 2024-02-06T16:55:41Z

improves the software fallback

I still see references to Math.Round being an intrinsic. Is that only in support of cases where the argument is known constant, or are there other cases where this managed implementation isn't used?

filipnavara · 2024-02-06T16:57:40Z

Is that only in support of cases where the argument is known constant, or are there other cases where this managed implementation isn't used?

ARM64 implements it using a HW instruction.

MichalPetryka · 2024-02-06T17:30:58Z

Is that only in support of cases where the argument is known constant, or are there other cases where this managed implementation isn't used?

ARM64 implements it using a HW instruction.

And X86 does too for all but AwayFromZero.

tannergooding · 2024-02-06T17:33:35Z

Most of our platforms use an intrinsic where possible. The exact mapping of that is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L6840

For Arm64 we can always accelerate while for x86/x64, we can only trivially accelerate on SSE4.1 capable hardware (which is pretty much everything) (we could also accelerate on SSE2 hardware via cvtss2si followed by cvtsi2ss, but its not worth the complexity).

The software fallback is still used for indirect invocation, for certain cases involving tail calls that prevent us from optimizing it as an intrinsic, and on other platforms outside what RyuJIT supports.

…in stays correct

ryujit-bot · 2024-02-06T19:06:39Z

Diff results for #98040

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	-0.01%

Details here

ryujit-bot · 2024-02-06T20:06:47Z

Diff results for #98040

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on linux/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	-0.01%

Details here

stephentoub · 2024-02-07T15:59:48Z

src/libraries/System.Private.CoreLib/src/System/Double.cs

-        // This is probably not worth inlining, it has branches and should be rarely called
-        public static unsafe bool IsSubnormal(double d)
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static bool IsZero(double d)


Do we need this helper if all it's doing is == 0?

It allows the code to be consistent with Half (where == 0 is less efficient today), with Generic Math (where a user really needs to use the exposed IsZero API), and is something we should probably expose publicly anyways since it's part of the IEEE 754 Required Operations (under 5.7.2 General Operations).

stephentoub · 2024-02-07T16:02:56Z

src/libraries/System.Private.CoreLib/src/System/Half.cs

+                bits += 1;
+            }
+            else
+            {
+                bits -= 1;


Nit: Curious for the reason for prefering this over ++ and -- ?

Just my preferred style. I personally don't like x++ or x-- as a standalone expression (in loops its fine), as I find it makes the code much harder to read/reason about. The x += 1 and x -= 1 instead makes it very clear what's going on and that an assignment/mutation is happening here.

stephentoub · 2024-02-07T16:06:36Z

src/libraries/System.Private.CoreLib/src/System/Single.cs

-            int bits = BitConverter.SingleToInt32Bits(f);
-            return (bits & 0x7FFFFFFF) < 0x7F800000;
+            uint bits = BitConverter.SingleToUInt32Bits(f);
+            return (~bits & PositiveInfinityBits) != 0;


Is this more efficient, or it's just clearer what it's doing?

Bit more efficient (both perf and space wise) for 64-bit, while allowing smaller code in hoistable loops for 32-bit. It also allows branching code to be a bit more efficient (instead of and; cmp; jcc, we just do andn; jcc).

Basically the old pattern was:

; 32-bit vmovd eax, xmm0 ; 4-bytes, 3-cycles and eax, 0x7FFF_FFFF ; 5-bytes, 1-cycles cmp eax, 0x7F80_0000 ; 5-bytes, 1-cycles setl al ; 3-bytes, 1-cycles movzx eax, al ; 3-bytes, 1-cycles ; 20-bytes, 7-cycles ; 64-bit vmovd rax, xmm0 ; 5-bytes, 3-cycles mov rcx, 0x7FFF_FFFF_FFFF_FFFF ; 10-bytes, 1-cycles and rax, rcx ; 3-bytes, 1-cycles mov rcx, 0x7FF0_0000_0000_0000 ; 10-bytes, 1-cycles cmp rax, rcx ; 3-bytes, 1-cycles setl al ; 3-bytes, 1-cycles movzx eax, al ; 3-bytes, 1-cycles ; 37-bytes, 9-cycles

While the new pattern can be much smaller:

; 32-bit vmovd eax, xmm0 ; 4-bytes, 3-cycles mov ecx, 0x7F80_0000 ; 5-bytes, 1-cycles andn eax, eax, ecx ; 5-bytes, 1-cycles setne al ; 3-bytes, 1-cycles movzx rax, al ; 3-bytes, 1-cycles ; 20-bytes, 7-cycles ; 64-bit vmovd rax, xmm0 ; 5-bytes, 3-cycles mov rcx, 0x7FF0_0000_0000_0000 ; 10-bytes, 1-cycles andn rax, rax, rcx ; 5-bytes, 1-cycles setne al ; 3-bytes, 1-cycles movzx rax, al ; 3-bytes, 1-cycles ; 26-bytes, 7-cycles

tannergooding · 2024-02-07T16:36:16Z

@dotnet-policy-service rerun

tannergooding · 2024-02-07T16:36:46Z

@dotnet-policy-bot rerun

tannergooding added 4 commits February 6, 2024 05:13

Simplify the fallback implementation for Math.Round

9eebc35

Improve some of the floating-point classification APIs

d3cf93f

Improve the algorithm used for ILogB

ed4a649

Do some minor cleanup of functions in Half, Single, and Double

4592a07

ghost assigned tannergooding Feb 6, 2024

dotnet-issue-labeler bot added the area-System.Numerics label Feb 6, 2024

tannergooding added 2 commits February 6, 2024 09:35

Fix a build failure

1a2e6b0

Ensure Half.IsNormal and Half.IsSubnormal casts to ushort so the doma…

1adf0c5

…in stays correct

Make sure BitDecrement checks IsFinite(x) not IsFinite(bits)

fa7dd98

This was referenced Feb 7, 2024

Tests crashing in CI with no dump: exit code 137 means SIGKILL Killed #97049

Closed

System.Net.Security.Tests.SslStreamCertificateContextOcspLinuxTests.RefreshOcspResponse_BeforeExpiration test failure #97779

Closed

stephentoub reviewed Feb 7, 2024

View reviewed changes

stephentoub approved these changes Feb 7, 2024

View reviewed changes

tannergooding merged commit 62d7c6a into dotnet:main Feb 7, 2024

tannergooding deleted the math-round branch February 7, 2024 16:39

build-analysis bot mentioned this pull request Feb 7, 2024

System.Net.Security.Tests.SslStreamCertificateContextOcspLinuxTests.FetchOcspResponse_FirstInvalidThenValid test failure #97836

Closed

DrewScoggins mentioned this pull request Feb 13, 2024

[Perf] Linux/x64: 2 Regressions on 2/7/2024 9:09:09 PM #98371

Closed

kotlarmilos mentioned this pull request Feb 14, 2024

[Perf] Linux/x64: 2 Regressions on 2/7/2024 9:09:09 PM dotnet/perf-autofiling-issues#29026

Closed

cincuranet mentioned this pull request Feb 15, 2024

Regressions in System.MathBenchmarks #98505

Closed

github-actions bot locked and limited conversation to collaborators Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040

Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040

tannergooding commented Feb 6, 2024

ghost commented Feb 6, 2024

stephentoub commented Feb 6, 2024

filipnavara commented Feb 6, 2024

MichalPetryka commented Feb 6, 2024

tannergooding commented Feb 6, 2024

ryujit-bot commented Feb 6, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 6, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for windows/arm64 ran on linux/x64

stephentoub Feb 7, 2024

tannergooding Feb 7, 2024

stephentoub Feb 7, 2024

tannergooding Feb 7, 2024

stephentoub Feb 7, 2024

tannergooding Feb 7, 2024 •

edited

Loading

tannergooding commented Feb 7, 2024

tannergooding commented Feb 7, 2024

Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040

Improve Math.Round, Math.ILogB, and do some minor cleanup of Half, Single, and Double #98040

Conversation

tannergooding commented Feb 6, 2024

ghost commented Feb 6, 2024

stephentoub commented Feb 6, 2024

filipnavara commented Feb 6, 2024

MichalPetryka commented Feb 6, 2024

tannergooding commented Feb 6, 2024

ryujit-bot commented Feb 6, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 6, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for windows/arm64 ran on linux/x64

stephentoub Feb 7, 2024

Choose a reason for hiding this comment

tannergooding Feb 7, 2024

Choose a reason for hiding this comment

stephentoub Feb 7, 2024

Choose a reason for hiding this comment

tannergooding Feb 7, 2024

Choose a reason for hiding this comment

stephentoub Feb 7, 2024

Choose a reason for hiding this comment

tannergooding Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

tannergooding commented Feb 7, 2024

tannergooding commented Feb 7, 2024

tannergooding Feb 7, 2024 •

edited

Loading