Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some System.Decimal performance improvements #99212

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

Daniel-Svensson
Copy link
Contributor

@Daniel-Svensson Daniel-Svensson commented Mar 3, 2024

Speed up decimal Multiplication and Division (mostly on x64, but some minor improvements on x86).

It does mostly 3 tricks:

  • Use new X86 unstrincts for DivRem to reduce number of divisions, an to avoid doing expensive 64bit divisions (via helper call) in 32bit mode
  • Add a helper for 64 * 32 bit multiplication (which on 64bit platforms becomes a single call to BigMul)
  • Changes ulong Math.BigMul(uint, uint) to not use BMI2.MultiplyNoFlags for better generated code (it gets rid of the memory write)

This is small port of the code from #7778 now that DivRem is accessible.
There is some more fixes that can be done at a later time (such as using 64 by 32 multiply in more places), but many of the original changes are not relevant due either missing low lewel primitives (such as carry) or worse code generated code (big mul)

Questions

  • I saw code currently uses both checks on IntPtr.Size and #if TARGET_32BIT for target conditional code, what is the preferred style ?
  • How do you prioritize performance of 64bit vs 32bit runtime ?
    • I could make a go and try and rewrite multiply to the fully optimized C++ version, but such a C# version would probably have some loss of performance for the 32bit version

Remarks for better performance

Benchmarks

Full benchmark code can be found at https://github.com/Daniel-Svensson/ClrExperiments/tree/master/ClrDecimal/Benchmarks

Overview

  • x64
    • Division time is 60-80% of old time (~25-66% speedup)
    • Multiplication worst case down to 65% (~53% speedup)
  • x86
    • Division time is 70-86% of old time (~16-42% speedup)
    • Multiplication - might be a few % slower in worst case (not more than 8%)

Divide x64


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio
New 1 3 1/3 32bit division 26.42 ns 0.115 ns 0.102 ns 0.71
Main 1 3 1/3 32bit division 37.40 ns 0.038 ns 0.032 ns 1.00
New 107374182.39 3.3 34bit / 32bit 24.36 ns 0.488 ns 0.407 ns 0.78
Main 107374182.39 3.3 34bit / 32bit 31.26 ns 0.469 ns 0.439 ns 1.00
New 10145(...)50239 [21] 3 96bit / 32bit 18.11 ns 0.126 ns 0.112 ns 0.61
Main 10145(...)50239 [21] 3 96bit / 32bit 29.63 ns 0.118 ns 0.098 ns 1.00
New 39291(...)21183 [22] 12884901920 96bit / 64bit 30.19 ns 0.157 ns 0.147 ns 0.86
Main 39291(...)21183 [22] 12884901920 96bit / 64bit 35.06 ns 0.135 ns 0.126 ns 1.00
New 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 45.50 ns 0.184 ns 0.172 ns 0.78
Main 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 58.17 ns 0.249 ns 0.233 ns 1.00

64bit Multiply


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio
New 15564.65093 0.00000003 32bit * 32bit 5.537 ns 0.0281 ns 0.0263 ns 1.01
Main 15564.65093 0.00000003 32bit * 32bit 5.503 ns 0.0311 ns 0.0275 ns 1.00
New 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 22.299 ns 0.1154 ns 0.0964 ns 0.94
Main 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 23.860 ns 0.1578 ns 0.1399 ns 1.00
New 21702051861934.75013 51240935.53816662 64it * 64bit 13.066 ns 0.0910 ns 0.0851 ns 0.89
Main 21702051861934.75013 51240935.53816662 64it * 64bit 14.759 ns 0.1160 ns 0.1085 ns 1.00
New 57510(...)29861 [21] 0.01193046 96bit * 32bit 6.228 ns 0.0496 ns 0.0464 ns 0.65
Main 57510(...)29861 [21] 0.01193046 96bit * 32bit 9.584 ns 0.0786 ns 0.0735 ns 1.00

x86 (32-bit)

32bit benchmarks results

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio
Ole32 1 3 1/3 32bit division 31.68 ns 0.052 ns 0.049 ns 0.51
New 1 3 1/3 32bit division 44.65 ns 0.094 ns 0.083 ns 0.72
Main 1 3 1/3 32bit division 62.36 ns 0.533 ns 0.472 ns 1.00
Ole32 107374182.39 3.3 34bit / 32bit 27.64 ns 0.032 ns 0.027 ns 0.49
New 107374182.39 3.3 34bit / 32bit 39.30 ns 0.090 ns 0.084 ns 0.70
Main 107374182.39 3.3 34bit / 32bit 56.35 ns 0.057 ns 0.048 ns 1.00
Ole32 10145(...)50239 [21] 3 96bit / 32bit 24.04 ns 0.018 ns 0.016 ns 0.70
New 10145(...)50239 [21] 3 96bit / 32bit 29.66 ns 0.041 ns 0.039 ns 0.86
Main 10145(...)50239 [21] 3 96bit / 32bit 34.52 ns 0.077 ns 0.068 ns 1.00
Ole32 39291(...)21183 [22] 12884901920 96bit / 64bit 43.04 ns 0.036 ns 0.028 ns 0.50
New 39291(...)21183 [22] 12884901920 96bit / 64bit 65.67 ns 0.187 ns 0.166 ns 0.77
Main 39291(...)21183 [22] 12884901920 96bit / 64bit 85.67 ns 0.112 ns 0.105 ns 1.00
Ole32 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 48.31 ns 0.045 ns 0.038 ns 0.43
New 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 79.02 ns 0.514 ns 0.456 ns 0.71
Main 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 112.06 ns 0.549 ns 0.486 ns 1.00

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host] : .NET 8.0.2 (8.0.224.6711), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio RatioSD
New 15564.65093 0.00000003 32bit * 32bit 12.28 ns 0.040 ns 0.037 ns 1.03 0.00
Main 15564.65093 0.00000003 32bit * 32bit 11.92 ns 0.037 ns 0.033 ns 1.00 0.00
New 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 64.89 ns 0.160 ns 0.142 ns 1.03 0.00
Main 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 63.14 ns 0.161 ns 0.134 ns 1.00 0.00
New 21702051861934.75013 51240935.53816662 64it * 64bit 40.29 ns 0.807 ns 0.863 ns 0.98 0.02
Main 21702051861934.75013 51240935.53816662 64it * 64bit 41.18 ns 0.293 ns 0.274 ns 1.00 0.00
New 57510(...)29861 [21] 0.01193046 96bit * 32bit 18.50 ns 0.079 ns 0.070 ns 1.08 0.01
Main 57510(...)29861 [21] 0.01193046 96bit * 32bit 17.08 ns 0.087 ns 0.082 ns 1.00 0.00

…se code than

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3

| Method                     | a | b          | Mean     | Error     | StdDev    | Allocated |
|--------------------------- |-- |----------- |---------:|----------:|----------:|----------:|
| Mul64By32_New | 3 | 4294967295 | 2.068 ns | 0.0459 ns | 0.0383 ns |         - |
| Mul64By32_Ori                |  3   | 4294967295 | 2.916 ns | 0.0231 ns | 0.0193 ns |         - |
- Add comment to BigMul64By32 and make it return nunit to avoid clearing upper 32 bits
- Simplify IncreaseScale
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 3, 2024
@Daniel-Svensson Daniel-Svensson changed the title Some Decimal performance improvements Some System.Decimal performance improvements Mar 3, 2024
@danmoseley
Copy link
Member

Cc @jkotas for the questions about relative prioritization of 32bit performance.

@huoyaoyuan
Copy link
Member

@jkotas jkotas added area-System.Numerics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 4, 2024
@ghost
Copy link

ghost commented Mar 4, 2024

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Speed up decimal Multiplication and Division (mostly on x64, but some minor improvements on x86).

It does mostly 3 tricks:

  • Use new X86 unstrincts for DivRem to reduce number of divisions, an to avoid doing expensive 64bit divisions (via helper call) in 32bit mode
  • Add a helper for 64 * 32 bit multiplication (which on 64bit platforms becomes a single call to BigMul)
  • Changes ulong Math.BigMul(uint, uint) to not use BMI2.MultiplyNoFlags for better generated code (it gets rid of the memory write)

This is small port of the code from #7778 now that DivRem is accessible.
There is some more fixes that can be done at a later time (such as using 64 by 32 multiply in more places), but many of the original changes are not relevant due either missing low lewel primitives (such as carry) or worse code generated code (big mul)

Questions

  • I saw code currently uses both checks on IntPtr.Size and #if TARGET_32BIT for target conditional code, what is the preferred style ?
  • How do you prioritize performance of 64bit vs 32bit runtime ?
    • I could make a go and try and rewrite multiply to the fully optimized C++ version, but such a C# version would probably have some loss of performance for the 32bit version

Remarks for better performance

Benchmarks

Full benchmark code can be found at https://github.com/Daniel-Svensson/ClrExperiments/tree/master/ClrDecimal/Benchmarks

64bit Divide


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio
New 92233(...)63.19 [21] 3.3 34bit / 32bit 23.09 ns 0.145 ns 0.121 ns 0.54
Main 92233(...)63.19 [21] 3.3 34bit / 32bit 42.75 ns 0.195 ns 0.183 ns 1.00
New 10145(...)02.39 [22] 0.3 96bit / 32bit 14.21 ns 0.086 ns 0.072 ns 0.48
Main 10145(...)02.39 [22] 0.3 96bit / 32bit 29.83 ns 0.074 ns 0.066 ns 1.00
New 39291(...)11.83 [23] 12884901.920 96bit / 64bit 30.57 ns 0.045 ns 0.040 ns 0.86
Main 39291(...)11.83 [23] 12884901.920 96bit / 64bit 35.45 ns 0.190 ns 0.159 ns 1.00
New 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 47.87 ns 0.129 ns 0.114 ns 0.82
Main 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 58.72 ns 0.106 ns 0.094 ns 1.00

64bit Multiply


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host]     : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2


Method a b descr Mean Error StdDev Ratio
New 15564.65093 0.00000003 32bit * 32bit 5.580 ns 0.0320 ns 0.0300 ns 1.00
Main 15564.65093 0.00000003 32bit * 32bit 5.578 ns 0.0096 ns 0.0085 ns 1.00
New 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 22.684 ns 0.2357 ns 0.2089 ns 0.94
Main 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 24.079 ns 0.0712 ns 0.0666 ns 1.00
New 21702051861934.75013 51240935.53816662 64it * 64bit 12.748 ns 0.0281 ns 0.0249 ns 0.91
Main 21702051861934.75013 51240935.53816662 64it * 64bit 13.954 ns 0.0561 ns 0.0525 ns 1.00
New 57510(...)29861 [21] 0.01193046 96bit * 32bit 6.270 ns 0.0156 ns 0.0130 ns 0.65
Main 57510(...)29861 [21] 0.01193046 96bit * 32bit 9.707 ns 0.1240 ns 0.1160 ns 1.00

32bit

32bit benchmarks results

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio RatioSD
New 92233(...)63.19 [21] 3.3 34bit / 32bit 60.96 ns 0.172 ns 0.161 ns 1.01 0.01
Main 92233(...)63.19 [21] 3.3 34bit / 32bit 60.42 ns 0.564 ns 0.500 ns 1.00 0.00
New 10145(...)02.39 [22] 0.3 96bit / 32bit 36.85 ns 0.072 ns 0.063 ns 1.01 0.00
Main 10145(...)02.39 [22] 0.3 96bit / 32bit 36.49 ns 0.090 ns 0.085 ns 1.00 0.00
New 39291(...)11.83 [23] 12884901.920 96bit / 64bit 78.43 ns 0.244 ns 0.217 ns 0.82 0.01
Main 39291(...)11.83 [23] 12884901.920 96bit / 64bit 95.56 ns 0.663 ns 0.620 ns 1.00 0.00
New 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 81.67 ns 1.663 ns 2.489 ns 0.62 0.03
Main 39291(...)21183 [22] 18446744211148505120 96bit / 96bit 132.38 ns 2.643 ns 4.559 ns 1.00 0.00

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X86 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain  

Method a b descr Mean Error StdDev Ratio
New 15564.65093 0.00000003 32bit * 32bit 10.65 ns 0.047 ns 0.042 ns 0.86
Main 15564.65093 0.00000003 32bit * 32bit 12.36 ns 0.062 ns 0.052 ns 1.00
New 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 65.15 ns 0.159 ns 0.141 ns 1.00
Main 18446(...)19935 [21] 38738(...)51235 [22] 96bit * 96bit 64.84 ns 0.178 ns 0.149 ns 1.00
New 21702051861934.75013 51240935.53816662 64it * 64bit 40.23 ns 0.119 ns 0.112 ns 0.97
Main 21702051861934.75013 51240935.53816662 64it * 64bit 41.45 ns 0.534 ns 0.500 ns 1.00
New 57510(...)29861 [21] 0.01193046 96bit * 32bit 18.69 ns 0.082 ns 0.077 ns 1.10
Main 57510(...)29861 [21] 0.01193046 96bit * 32bit 16.96 ns 0.051 ns 0.048 ns 1.00
Author: Daniel-Svensson
Assignees: -
Labels:

area-System.Numerics, needs-area-label

Milestone: -

@jkotas
Copy link
Member

jkotas commented Mar 4, 2024

relative prioritization of 32bit performance.

We do not actively invest into improving 32bit performance specifically. At the same, we avoid regressing 32bit performance unless there is a very good reason.

// TODO: https://github.com/dotnet/runtime/issues/5213
ulong tmp, div;
if (bufNum.U2 != 0)
if (X86.X86Base.X64.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of extra code for a relatively small performance increase on modern CPUs.

We currently use Skylake as the baseline for a lot of our perf score numbers and 3ns savings for 13 lines of new code (+23 more for x86) doesn't really seem worth it.

Ideally any improvements would be shared across all 3 platforms or be significant enough to make the additional complexity worthwhile.

Copy link
Contributor Author

@Daniel-Svensson Daniel-Svensson Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I removed the x64 specific code and just kept the x86 since it was just a few ns faster than the x86 specific code (as long as there are no branch misspredictions) and the x86 should be faster on Skylake even with a misspredicted branch.

It stills gives around ~10ns faster division for 96/32 case

I can change it to x64 only code insted (2 if statements and 2 DivRem calls) if you rather like it.

/// <param name="den">64-bit divisor</param>
/// <returns>Returns quotient. Remainder overwrites lower 64-bits of dividend.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe ulong Div128By64(Buf16* bufNum, ulong den)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general comment, but this is notably a case where we'd ideally just use UInt128.operator / and have the JIT recognize the general pattern instead.

Doing general purpose optimizations is almost always a better option than doing one offs. Ideally the JIT recognizes cases like ulong / uint or uint128 / ulong and does the optimal thing on platforms that support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general that would be god, but in this specific case there we know several things that the compiler cannot know and optimize for.

  • denominator has already been shifted left so it has highest bit set
    • The metod will probably be called several times in a row so it makes sense to do bit shifting etc before it is called
  • den > bufNum->High64

However since both types lives in CoreLib, several helpers here could probably be made internal and useable by both UInt128 and Decimal division code. For example I think both this metod and maybe division by 32bit could be usefull from within UInt128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #99747 with a similar fix for UInt128.

I did not make this internal and call it from UInt128 at the moment, but if the code works as fast without the divisor having the highest bit set then it might make sense to do so.

/// <returns>hi bits of the result</returns>
/// <remarks>returns nuint instead of uint to skip clearing upper 32bits on 64bit platforms</remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static nuint BigMul64By32(ulong a, uint b, out ulong low)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general comment. This is another case where this should really just be a general purpose opt in the JIT, recognizing ulong * uint and optimizing it accordingly, rather than requiring a full ulong * ulong, just opting to do whichever is most efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good if the JIT could recognice it and optimize the code.
Since the currently most popular approach to the get more than 64bit result is to call BigMul I hope that that method can be optimized in the future.

I created an internal overload of BigMul in the time beeing, which can easily be removed when the JIT can optimize 32*64 bit => 128bit multiplications

Comment on lines 156 to 171
#if TARGET_32BIT
if (Bmi2.IsSupported)
{
uint low;
uint high = Bmi2.MultiplyNoFlags(a, b, &low);
return ((ulong)high << 32) | low;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the intent was for us to have an overload that returns a tuple and uses the JIT multi-reg return hookups, but we haven't exposed that yet.

We ideally aren't removing opts like this, especially when even with the memory access it is potentially faster or close enough in most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added it back, but it contributed to more than 20% slower division (more than 10ns for 1/3) so I removed a of calls and replaced it with the fallback path (ulong)(uint)a * (uint)b which the JIT recognize and generates optmized code for.

Since the JIT already has specialized codegen for 32 * 32 => 64bit division would it not make sense to just emit mulx there (or for normal multiplication as well) if it could lead to more efficient code (maybe when writing to memory or when the register usage gets better).

It seems clang uses mulx for 32 * 32 => 64bit in 32bit mode and 64 * 64 => 128bit in 64bit mode.

@tannergooding
Copy link
Member

I'm, personally, not a huge fan of a lot of the changes here. It's a lot of new code for what appears to be some relatively minor perf increases overall.

Where the perf increase is more measurable (specifically n-bit / 32-bit), it looks like places where we're missing some more general purpose JIT optimizations which would likely be a better overall investment, working across multiple hardware configurations and for more than just System.Decimal.

@Daniel-Svensson Daniel-Svensson marked this pull request as ready for review March 13, 2024 20:05
@tannergooding
Copy link
Member

This has build failures that need to be addressed. They come from using DivRem which requires opting into preview features currently as there are some known codegen pessimizations that need to be handled still (#82194).

@tannergooding tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 12, 2024
@dotnet-policy-service dotnet-policy-service bot removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Jun 27, 2024
@Daniel-Svensson
Copy link
Contributor Author

@tannergooding I've opted into preview features for the file, hopefully it should compile again.

Hope it is ok to opt into preview features for the whole file instead of doing 3 pairs of suppress/restore

@tannergooding
Copy link
Member

Rerunning CI. Going to finish reviewing this today after CI finishes

@xtqqczze
Copy link
Contributor

@MihuBot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants