-
Notifications
You must be signed in to change notification settings - Fork 2.7k
DictionarySlim backport improvements: remove null buckets and entries checks #23003
DictionarySlim backport improvements: remove null buckets and entries checks #23003
Conversation
@dotnet-bot test Ubuntu x64 Checked CoreFX Tests please |
Build timed out after 6 hours. Not sure what the issue is. |
@dotnet-bot test Ubuntu x64 Checked CoreFX Tests please |
@TylerBrinkley do you have any diff for the performance before and after this change? cc: @danmosemsft |
I thought @MarcoRossignoli already tried this change and found it hurt perf in Dictionary - @MarcoRossignoli ? As @safern suggests any change to Dictionary needs performance profiling with great care. |
In my attempt I used dummy entry that increase static footprint. This change is slightly different and maybe a bit better(no static footprint), only more ctor code for defaults(we pay always for ctor and we should gain perf only on possible TryAdd).
In my case increase static footprint with no observable perf gain |
Here are my benchmark results. BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=2835936 Hz, Resolution=352.6173 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-009812
[Host] : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
Job-OMNCIS : .NET Core e1310cc1-e735-45a3-8dc3-5bf6f690008e (CoreCLR 4.6.27513.0, CoreFX 4.7.19.16301), 64bit RyuJIT
Job-NCDQEB : .NET Core 94335f94-8974-48bb-a0e8-8be3fe21e22d (CoreCLR 4.6.27513.0, CoreFX 4.7.19.16301), 64bit RyuJIT
Runtime=Core IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
|
Wow, I wonder why ContainsKeyFalse is so good 1.58x with only null check removed...as a matter of interest have tried to dump asm? |
@MarcoRossignoli How does one go about that? |
There is an official guide https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md You need to deploy a custom app using your local coreclr bins and debug compiled jitter(that recognize the knobs and emits dump on console, usually I forward to txt file and after Tool.Diff with VS) If you're in throuble DM me on twitter I can help you on soma chats(gitter etc...) |
I ran the dump of
and
|
Before/After? |
After, why is it generating 2 versions of the method? |
Dictionary is generics so we have one copy shared for ref types and one for every value type https://alexandrnikitin.github.io/blog/dotnet-generics-under-the-hood/ can you confirm your test with string/int? Dictionary2:FindEntry(ref):int:this -> string
|
We could try to compare |
For those wondering, this is how the benchmarks were performed.
Benchmarks command
|
@jkotas I did some "pair performance programming" with @TylerBrinkley on gitter(great!) about his result on performance that to me seems huge for a null check. ...
; Tier-1 compilation
...
G_M32795_IG02:
mov edi, -1 <--REMOVED IN AFTER
mov rbx, gword ptr [rcx+8]
mov rbp, gword ptr [rcx+16]
xor r14d, r14d
test rbx, rbx <--REMOVED IN AFTER
je G_M32795_IG10 <-- REMOVED IN AFTER
mov r15, gword ptr [rcx+24]
test r15, r15
jne SHORT G_M32795_IG06
mov ecx, esi
and ecx, 0xD1FFAB1E
mov r12d, dword ptr [rbx+8]
mov eax, ecx
cdq
idiv edx:eax, r12d
cmp edx, r12d
jae G_M32795_IG14
movsxd rdx, edx
mov edi, dword ptr [rbx+4*rdx+16]
dec edi
mov r13d, dword ptr [rbp+8]
...
; Total bytes of code 323, prolog size 18 for method Dictionary`2:FindEntry(int):int:this
; ============================================================ after ...
; Tier-1 compilation
...
G_M32795_IG02:
mov rdi, gword ptr [rcx+8]
mov rbx, gword ptr [rcx+16]
xor ebp, ebp
mov r14, gword ptr [rcx+24]
test r14, r14
jne SHORT G_M32795_IG06
mov ecx, esi
and ecx, 0xD1FFAB1E
mov r15d, dword ptr [rdi+8]
mov eax, ecx
cdq
idiv edx:eax, r15d
cmp edx, r15d
jae G_M32795_IG14
movsxd rdx, edx
mov r12d, dword ptr [rdi+4*rdx+16]
dec r12d
mov r13d, dword ptr [rbx+8]
...
; Total bytes of code 312, prolog size 18 for method Dictionary`2:FindEntry(int):int:this
; ============================================================ Is possible to you a boost of ~1.5x ? |
This looks like another one of those rare cases where the improvement gets amplified due to processor micro-architecture that I have mentioned in #22832 (comment) . You would have to run under Intel VTune to figure out what is going on. |
I'll play with It!Too curious! |
BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727536 Hz, Resolution=366.6313 ns, Timer=TSC
.NET Core SDK=3.0.100-preview3-010431
[Host] : .NET Core 3.0.0-preview3-27503-5 (CoreCLR 4.6.27422.72, CoreFX 4.7.19.12807), 64bit RyuJIT
Job-ZWTXZW : .NET Core 35da75ce-57d9-4fd2-8136-0b709d016534 (CoreCLR 4.6.27615.0, CoreFX 4.7.19.16801), 64bit RyuJIT
Job-XQDVUB : .NET Core 6c76f4a5-4bfa-47fc-83b0-d4e97369a2ef (CoreCLR 4.6.27616.0, CoreFX 4.7.19.16801), 64bit RyuJIT
Job-VUANFM : .NET Core 35da75ce-57d9-4fd2-8136-0b709d016534 (CoreCLR 4.6.27615.0, CoreFX 4.7.19.16801), 64bit RyuJIT
Job-HOPUNA : .NET Core 6c76f4a5-4bfa-47fc-83b0-d4e97369a2ef (CoreCLR 4.6.27616.0, CoreFX 4.7.19.16801), 64bit RyuJIT
Runtime=Core IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
This is my results
I see a regression on default ctor, I tested more times, maybe difference in microarchitecture. |
/azp run |
I will close this now and re-open in dotnet/platform when that's ready. |
Initialize
Dictionary
's buckets and entries with non-null values to remove null checks.I'm having trouble running the tests locally which I believe has something to do with installing the VS 2019 RC so I'm going to rely on CI here instead. I also don't have any benchmarks though the changes should only introduce improvements.
I'd suggest hiding white-space changes when reviewing.
Part of https://github.com/dotnet/corefx/issues/33392.
Alternative to #22599.