-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SortedSet DeepCopy performance #56561
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsI noticed that SortedDictionary copying was allocation more memory than iterating the initial dictionary and adding items one at a time. I realized that the optimization done as part of #45659 was not completely successful and that the SortedSet deep copy could be improved. For SortedSet prefer recursion through the tree over allocating multiple stacks. dotnet/performance@main...johnthcall:johncall/SortedSetDeepCopy
|
newRight = newRight.Left; | ||
} | ||
} | ||
Node newNode = ShallowClone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally avoid using recursion out of concern for potential stack overflows. Theoretically speaking we should be fine since RB trees have O(log n) depth, but I'm not familiar enough with the implementation to know if it can admit unbalanced instances.
The performance improvements are impressive, but I'd be curious to know how much of that was contributed by the |
@eiriktsarpalis Below is the performance of main vs just the kvp comparer change vs both changes. If we decide to avoid the recursion in DeepCopy I'll make a separate PR for just the KVP comparer change.
|
Introducing one change per PR is good practice in general (easier to revert etc). The KVP change is low risk with great perf improvements in its own right, so it certainly meets the bar for .NET 6 RC1. @layomia @stephentoub thoughts? |
Separating them sounds good. |
I've reverted SortedDictionary changes from here and separated it to the following #56634 |
I moved the milestone to 7.0.0 given that we're very close to the .NET 6 release date. I will be following up with a more thorough review once .NET 7 development commences. |
Hi @johnthcall, looking at this PR again it seems like using recursion in this context is safe, since depth is always bounded by 2 log(n + 1) + 1. The existing implementation does contain a few inefficiencies (e.g. allocating two stacks and traversing twice). I'm wondering though if we could get some of the perf benefits of your approach while still avoiding recursion. For example, consider the following (untested) implementation: public Node DeepClone(int count)
{
#if DEBUG
Debug.Assert(count == GetCount());
#endif
Node newRoot = ShallowClone();
var pendingNodes = new Stack<(Node source, Node target)>(2 * Log2(count) + 2);
pendingNodes.Push((this, newRoot));
while (pendingNodes.TryPop(out var next))
{
Node clonedNode;
if (next.source.Left is Node left)
{
clonedNode = left.ShallowClone();
next.target.Left = clonedNode;
pendingNodes.Push((left, clonedNode));
}
if (next.source.Right is Node right)
{
clonedNode = right.ShallowClone();
next.target.Right = clonedNode;
pendingNodes.Push((right, clonedNode));
}
}
return newRoot;
} I'd be curious to see what performance of something like the above could be, compared to the recursive approach. |
@eiriktsarpalis Your code does pass all UT and has a perf improvement and 56 byte allocation improvement however the recursive change does outperform it. I understand because of the possibly imbalanced tree that we may want to avoid recursion, let me know what you'd like to do here.
|
Thanks for running the benchmarks. Theoretically speaking RB tree depths are bounded by O(log N), but I would need to spend time studying the actual implementation to see whether linear depth is possible in certain cases (for example it might be possible that a maliciously crafted BinaryFormatter payload or similar could result in an imbalanced set being hydrated and triggering SO when attempting to clone). cc @GrabYourPitchforks who might provide a security angle on using recursion in general. |
I took a closer look at the type's Still, I don't think the performance benefits justify the potential of introducing stack overflows. I therefore conclude that we should not be taking this change. Would be happy to consider performance optimizations that don't involve recursion over the tree. |
056dfc4
to
4c53f23
Compare
@eiriktsarpalis I've made the change use iteration instead. I tried changing the original implementation to use a single Stack like in your sample code as below but it's performance did not improve from main in the N=1000 benchmark so I've gone forward with your code change. public Node DeepClone(int count)
{
#if DEBUG
Debug.Assert(count == GetCount());
#endif
// Breadth-first traversal to recreate nodes, preorder traversal to replicate nodes.
var pendingNodes = new Stack<(Node source, Node target)>(2 * Log2(count) + 2);
Node newRoot = ShallowClone();
Node? originalCurrent = this;
Node newCurrent = newRoot;
while (originalCurrent != null)
{
pendingNodes.Push((originalCurrent, newCurrent));
newCurrent.Left = originalCurrent.Left?.ShallowClone();
originalCurrent = originalCurrent.Left;
newCurrent = newCurrent.Left!;
}
while (pendingNodes.TryPop(out var next))
{
Node? originalRight = next.source.Right;
Node? newRight = originalRight?.ShallowClone();
next.target.Right = newRight;
while (originalRight != null)
{
pendingNodes.Push((originalRight, newRight!));
newRight!.Left = originalRight.Left?.ShallowClone();
originalRight = originalRight.Left;
newRight = newRight.Left;
}
}
return newRoot;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Test failures seem related to #60151. |
I noticed that SortedDictionary copying was allocation more memory than iterating the initial dictionary and adding items one at a time. I realized that the optimization done as part of #45659 was not completely successful and that the SortedSet deep copy could be improved.
For SortedSet prefer recursion through the tree over allocating multiple stacks.
For SortedDictionary override Equals for KeyValuePairComparer so that SortedSets HasEqualComparer will pass to allow efficient deep copy.
dotnet/performance@main...johnthcall:johncall/SortedSetDeepCopy