Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ public static FrozenDictionary<TKey, TElement> ToFrozenDictionary<TSource, TKey,
// Ensure we have a Dictionary<,> using the specified comparer such that all keys
// are non-null and unique according to that comparer.
newDictionary = source as Dictionary<TKey, TValue>;
if (newDictionary is null || (newDictionary.Count != 0 && !newDictionary.Comparer.Equals(comparer)))
if (newDictionary is null)
{
newDictionary = new Dictionary<TKey, TValue>(comparer);
foreach (KeyValuePair<TKey, TValue> pair in source)
Expand All @@ -140,6 +140,18 @@ public static FrozenDictionary<TKey, TElement> ToFrozenDictionary<TSource, TKey,
newDictionary[pair.Key] = pair.Value;
}
}
else if (newDictionary.Count != 0 && !newDictionary.Comparer.Equals(comparer))
{
var dictionary = new Dictionary<TKey, TValue>(newDictionary.Count, comparer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could end up significantly over-allocating, no? Let's say the original use an Ordinal comparer, the new uses an OrdinalIgnoreCase comparer, and the original had every string entry copied many times just with different casing. The resulting dictionary's count is going to be much smaller than the originals, yet this will cause it to allocate much more memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub You're correct. Pre-allocating the dictionary with the original size may result in over-allocating when duplicate keys reduce the required capacity. I have removed the initial capacity specification.

Copy link
Member

@stephentoub stephentoub Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. At that point then, is the only difference this new code path adds is iterating over a Dictionary<> in a strongly-typed way rather than iterating over a dictionary via IEnumerable<KeyValuePair<>>? In this state, have you tried running the before/after benchmarks on .NET 10?

Copy link
Contributor Author

@prozolic prozolic Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the benchmark results accordingly.

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 11 (10.0.26100.6584)
11th Gen Intel Core i7-11700 2.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 10.0.100-rc.1.25451.107
  [Host]     : .NET 10.0.0 (10.0.25.45207), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-SADIXB : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

OutlierMode=DontRemove  PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=True  
MinIterationCount=15  WarmupCount=1  

| Method              | Size  | Mean          | Error         | StdDev        | Median        | Min           | Max           | Code Size | Gen0     | Gen1     | Gen2     | Allocated |
|-------------------- |------ |--------------:|--------------:|--------------:|--------------:|--------------:|--------------:|----------:|---------:|---------:|---------:|----------:|
| Original_Dictionary | 1     |      50.77 ns |      5.017 ns |      5.778 ns |      48.29 ns |      45.60 ns |      68.21 ns |   2,243 B |   0.0409 |        - |        - |     344 B |
| PR_Dictionary       | 1     |      36.30 ns |      1.882 ns |      2.167 ns |      35.54 ns |      34.84 ns |      42.11 ns |   1,892 B |   0.0354 |        - |        - |     296 B |
| Original_Dictionary | 10    |     270.57 ns |     17.663 ns |     20.341 ns |     264.05 ns |     250.11 ns |     317.83 ns |   3,179 B |   0.1185 |        - |        - |     992 B |
| PR_Dictionary       | 10    |     173.65 ns |      3.315 ns |      3.256 ns |     173.26 ns |     169.91 ns |     182.74 ns |   2,634 B |   0.1124 |        - |        - |     944 B |
| Original_Dictionary | 100   |   2,308.68 ns |    171.190 ns |    197.143 ns |   2,239.27 ns |   2,146.19 ns |   2,915.64 ns |   3,194 B |   0.9931 |   0.0170 |        - |    8328 B |
| PR_Dictionary       | 100   |   1,371.95 ns |     31.095 ns |     35.809 ns |   1,362.03 ns |   1,348.04 ns |   1,513.92 ns |   2,646 B |   0.9881 |   0.0165 |        - |    8280 B |
| Original_Dictionary | 1000  |  19,235.59 ns |    697.651 ns |    803.416 ns |  18,891.83 ns |  18,611.36 ns |  21,727.45 ns |   2,992 B |   9.6957 |   1.8646 |        - |   81304 B |
| PR_Dictionary       | 1000  |  13,123.78 ns |    558.138 ns |    642.753 ns |  12,917.55 ns |  12,450.91 ns |  14,648.06 ns |   2,981 B |   9.6845 |   1.3478 |        - |   81256 B |
| Original_Dictionary | 10000 | 257,255.96 ns | 26,300.558 ns | 30,287.762 ns | 246,032.88 ns | 224,345.99 ns | 362,544.12 ns |   2,993 B | 124.0672 | 124.0672 | 124.0672 |  753284 B |
| PR_Dictionary       | 10000 | 179,099.95 ns |  6,462.982 ns |  7,442.779 ns | 178,413.09 ns | 168,169.41 ns | 198,089.87 ns |   1,573 B | 124.3421 | 124.3421 | 124.3421 |  753236 B |

Copy link
Contributor

@hez2010 hez2010 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub from the updated benchmark result I see the original one that reserves the initial capacity allocated far less than the new one that doesn't do it. Instead of letting the backing storage to resizing its size to 2x each time (which would also result in over-allocating), pre-allocating the capacity that may be a bit more than the actual need seems definite more feasible to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to depend on how much of a difference is expected in count from the original collection that's using a different comparer. The benchmark is using a comparer with identical semantics, so the ideal is to just use the exact same capacity. If, however and for example, the original collection was using Ordinal, the new collection was using OrdinalIgnoreCase, and the original collection contained 1000 words each with 10 different casing variations, choosing that original count is going to result in 10x the allocation required whereas the normal growth strategy will only be 2x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially suggested pre-allocating, but after considering @stephentoub's concern about the worst-case scenario, I agree that not setting the initial capacity is the better approach.
Since the optimal capacity depends on on the comparer's behavior, the dynamic growth strategy is more robust.

foreach (KeyValuePair<TKey, TValue> pair in newDictionary)
{
// Dictionary's constructor uses Add, which will throw on duplicates.
// This implementation uses the indexer to avoid throwing and to overwrite
// existing entries such that last one wins.
dictionary[pair.Key] = pair.Value;
}
newDictionary = dictionary;
}

if (newDictionary.Count == 0)
{
Expand Down
Loading