Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SortedSet DeepCopy performance #56561

Merged
merged 3 commits into from
Oct 12, 2021

Conversation

johnthcall
Copy link
Contributor

I noticed that SortedDictionary copying was allocation more memory than iterating the initial dictionary and adding items one at a time. I realized that the optimization done as part of #45659 was not completely successful and that the SortedSet deep copy could be improved.

For SortedSet prefer recursion through the tree over allocating multiple stacks.
For SortedDictionary override Equals for KeyValuePairComparer so that SortedSets HasEqualComparer will pass to allow efficient deep copy.

dotnet/performance@main...johnthcall:johncall/SortedSetDeepCopy

Method Toolchain N Mean Error StdDev Ratio Allocated
SortedDictionaryCopy main 1 123.74 ns 3.476 ns 4.002 ns 1.00 392 B
SortedDictionaryCopy pr 1 54.46 ns 0.877 ns 0.777 ns 0.44 168 B
SortedSetCopy main 1 70.85 ns 0.821 ns 0.768 ns 1.00 272 B
SortedSetCopy pr 1 29.43 ns 0.661 ns 0.618 ns 0.42 96 B
SortedDictionaryCopy main 10 699.64 ns 13.457 ns 11.929 ns 1.00 1,136 B
SortedDictionaryCopy pr 10 218.58 ns 2.036 ns 1.904 ns 0.31 672 B
SortedSetCopy main 10 297.74 ns 2.702 ns 2.110 ns 1.00 800 B
SortedSetCopy pr 10 150.63 ns 1.824 ns 1.617 ns 0.51 528 B
SortedDictionaryCopy main 100 10,101.19 ns 175.585 ns 187.874 ns 1.00 7,664 B
SortedDictionaryCopy pr 100 1,896.56 ns 36.241 ns 41.735 ns 0.19 5,712 B
SortedSetCopy main 100 2,755.01 ns 41.014 ns 36.358 ns 1.00 5,216 B
SortedSetCopy pr 100 1,438.86 ns 20.528 ns 18.197 ns 0.52 4,848 B
SortedDictionaryCopy main 1000 140,422.29 ns 1,514.264 ns 1,264.479 ns 1.00 72,512 B
SortedDictionaryCopy pr 1000 21,330.74 ns 233.688 ns 195.140 ns 0.15 56,112 B
SortedSetCopy main 1000 28,099.61 ns 244.839 ns 217.043 ns 1.00 48,512 B
SortedSetCopy pr 1000 17,571.99 ns 154.552 ns 144.568 ns 0.63 48,048 B

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 29, 2021
@ghost
Copy link

ghost commented Jul 29, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

I noticed that SortedDictionary copying was allocation more memory than iterating the initial dictionary and adding items one at a time. I realized that the optimization done as part of #45659 was not completely successful and that the SortedSet deep copy could be improved.

For SortedSet prefer recursion through the tree over allocating multiple stacks.
For SortedDictionary override Equals for KeyValuePairComparer so that SortedSets HasEqualComparer will pass to allow efficient deep copy.

dotnet/performance@main...johnthcall:johncall/SortedSetDeepCopy

Method Toolchain N Mean Error StdDev Ratio Allocated
SortedDictionaryCopy main 1 123.74 ns 3.476 ns 4.002 ns 1.00 392 B
SortedDictionaryCopy pr 1 54.46 ns 0.877 ns 0.777 ns 0.44 168 B
SortedSetCopy main 1 70.85 ns 0.821 ns 0.768 ns 1.00 272 B
SortedSetCopy pr 1 29.43 ns 0.661 ns 0.618 ns 0.42 96 B
SortedDictionaryCopy main 10 699.64 ns 13.457 ns 11.929 ns 1.00 1,136 B
SortedDictionaryCopy pr 10 218.58 ns 2.036 ns 1.904 ns 0.31 672 B
SortedSetCopy main 10 297.74 ns 2.702 ns 2.110 ns 1.00 800 B
SortedSetCopy pr 10 150.63 ns 1.824 ns 1.617 ns 0.51 528 B
SortedDictionaryCopy main 100 10,101.19 ns 175.585 ns 187.874 ns 1.00 7,664 B
SortedDictionaryCopy pr 100 1,896.56 ns 36.241 ns 41.735 ns 0.19 5,712 B
SortedSetCopy main 100 2,755.01 ns 41.014 ns 36.358 ns 1.00 5,216 B
SortedSetCopy pr 100 1,438.86 ns 20.528 ns 18.197 ns 0.52 4,848 B
SortedDictionaryCopy main 1000 140,422.29 ns 1,514.264 ns 1,264.479 ns 1.00 72,512 B
SortedDictionaryCopy pr 1000 21,330.74 ns 233.688 ns 195.140 ns 0.15 56,112 B
SortedSetCopy main 1000 28,099.61 ns 244.839 ns 217.043 ns 1.00 48,512 B
SortedSetCopy pr 1000 17,571.99 ns 154.552 ns 144.568 ns 0.63 48,048 B
Author: johnthcall
Assignees: -
Labels:

area-System.Collections, community-contribution

Milestone: -

newRight = newRight.Left;
}
}
Node newNode = ShallowClone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally avoid using recursion out of concern for potential stack overflows. Theoretically speaking we should be fine since RB trees have O(log n) depth, but I'm not familiar enough with the implementation to know if it can admit unbalanced instances.

@eiriktsarpalis
Copy link
Member

The performance improvements are impressive, but I'd be curious to know how much of that was contributed by the KeyValuePairComparer changes versus the rewriting of DeepClone. Would it be possible to provide results measuring each change individually?

@johnthcall
Copy link
Contributor Author

@eiriktsarpalis Below is the performance of main vs just the kvp comparer change vs both changes. If we decide to avoid the recursion in DeepCopy I'll make a separate PR for just the KVP comparer change.

Method Toolchain N Mean Error StdDev Ratio Allocated
SortedDictionaryCopy main 1 233.25 ns 14.931 ns 17.195 ns 1.00 392 B
SortedDictionaryCopy kvpc 1 179.63 ns 4.817 ns 5.547 ns 0.77 344 B
SortedDictionaryCopy pr 1 95.33 ns 3.673 ns 4.229 ns 0.41 168 B
SortedDictionaryCopy main 10 1,353.38 ns 55.142 ns 63.501 ns 1.00 1,136 B
SortedDictionaryCopy kvpc 10 600.19 ns 30.233 ns 34.817 ns 0.44 944 B
SortedDictionaryCopy pr 10 323.45 ns 15.388 ns 17.720 ns 0.24 672 B
SortedDictionaryCopy main 100 16,490.14 ns 785.823 ns 904.955 ns 1.00 7,664 B
SortedDictionaryCopy kvpc 100 4,805.14 ns 91.329 ns 89.697 ns 0.29 6,080 B
SortedDictionaryCopy pr 100 2,782.22 ns 68.882 ns 76.562 ns 0.17 5,712 B
SortedDictionaryCopy main 1000 212,509.48 ns 10,768.851 ns 11,969.549 ns 1.00 72,512 B
SortedDictionaryCopy kvpc 1000 48,069.80 ns 1,324.823 ns 1,525.668 ns 0.23 56,576 B
SortedDictionaryCopy pr 1000 28,287.56 ns 1,260.076 ns 1,451.105 ns 0.13 56,112 B

@eiriktsarpalis
Copy link
Member

If we decide to avoid the recursion in DeepCopy I'll make a separate PR for just the KVP comparer change.

Introducing one change per PR is good practice in general (easier to revert etc). The KVP change is low risk with great perf improvements in its own right, so it certainly meets the bar for .NET 6 RC1.

@layomia @stephentoub thoughts?

@stephentoub
Copy link
Member

Separating them sounds good.

@johnthcall
Copy link
Contributor Author

I've reverted SortedDictionary changes from here and separated it to the following #56634

@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Aug 3, 2021
@eiriktsarpalis
Copy link
Member

I moved the milestone to 7.0.0 given that we're very close to the .NET 6 release date. I will be following up with a more thorough review once .NET 7 development commences.

@eiriktsarpalis
Copy link
Member

Hi @johnthcall, looking at this PR again it seems like using recursion in this context is safe, since depth is always bounded by 2 log(n + 1) + 1.

The existing implementation does contain a few inefficiencies (e.g. allocating two stacks and traversing twice). I'm wondering though if we could get some of the perf benefits of your approach while still avoiding recursion. For example, consider the following (untested) implementation:

public Node DeepClone(int count)
{
#if DEBUG
    Debug.Assert(count == GetCount());
#endif
    Node newRoot = ShallowClone();

    var pendingNodes = new Stack<(Node source, Node target)>(2 * Log2(count) + 2);
    pendingNodes.Push((this, newRoot));

    while (pendingNodes.TryPop(out var next))
    {
        Node clonedNode;

        if (next.source.Left is Node left)
        {
            clonedNode = left.ShallowClone();
            next.target.Left = clonedNode;
            pendingNodes.Push((left, clonedNode));
        }

        if (next.source.Right is Node right)
        {
            clonedNode = right.ShallowClone();
            next.target.Right = clonedNode;
            pendingNodes.Push((right, clonedNode));
        }
    }

    return newRoot;
}

I'd be curious to see what performance of something like the above could be, compared to the recursive approach.

@johnthcall
Copy link
Contributor Author

@eiriktsarpalis Your code does pass all UT and has a perf improvement and 56 byte allocation improvement however the recursive change does outperform it. I understand because of the possibly imbalanced tree that we may want to avoid recursion, let me know what you'd like to do here.

Toolchain N Mean Error StdDev Ratio Allocated
main 1 69.80 ns 3.569 ns 3.967 ns 1.00 240 B
eirik 1 52.87 ns 2.967 ns 3.298 ns 0.76 184 B
pr 1 32.39 ns 2.562 ns 2.847 ns 0.47 96 B
main 10 302.86 ns 13.321 ns 14.806 ns 1.00 768 B
eirik 10 242.44 ns 12.855 ns 14.804 ns 0.81 712 B
pr 10 169.16 ns 8.750 ns 8.986 ns 0.56 528 B
main 100 2,686.17 ns 121.347 ns 139.744 ns 1.00 5,184 B
eirik 100 2,510.29 ns 128.237 ns 147.678 ns 0.94 5,128 B
pr 100 1,548.30 ns 68.639 ns 76.292 ns 0.58 4,848 B
main 1000 29,168.67 ns 1,340.231 ns 1,543.412 ns 1.00 48,480 B
eirik 1000 26,338.11 ns 989.322 ns 1,139.305 ns 0.91 48,424 B
pr 1000 19,071.81 ns 745.912 ns 858.994 ns 0.66 48,048 B

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Sep 27, 2021

Thanks for running the benchmarks. Theoretically speaking RB tree depths are bounded by O(log N), but I would need to spend time studying the actual implementation to see whether linear depth is possible in certain cases (for example it might be possible that a maliciously crafted BinaryFormatter payload or similar could result in an imbalanced set being hydrated and triggering SO when attempting to clone).

cc @GrabYourPitchforks who might provide a security angle on using recursion in general.

@eiriktsarpalis
Copy link
Member

I took a closer look at the type's ISerializable implementation: elements are serialized as flat arrays so there doesn't appear to be a way for a valid payload to influence the shape of the materialized tree. So excluding any implementation bugs, future regressions or reflection code modifying private fields it doesn't seem to be possible to materialize linked list-like trees.

Still, I don't think the performance benefits justify the potential of introducing stack overflows. I therefore conclude that we should not be taking this change. Would be happy to consider performance optimizations that don't involve recursion over the tree.

@johnthcall
Copy link
Contributor Author

@eiriktsarpalis I've made the change use iteration instead. I tried changing the original implementation to use a single Stack like in your sample code as below but it's performance did not improve from main in the N=1000 benchmark so I've gone forward with your code change.

public Node DeepClone(int count)
{
#if DEBUG
    Debug.Assert(count == GetCount());
#endif
    // Breadth-first traversal to recreate nodes, preorder traversal to replicate nodes.

    var pendingNodes = new Stack<(Node source, Node target)>(2 * Log2(count) + 2);
    Node newRoot = ShallowClone();

    Node? originalCurrent = this;
    Node newCurrent = newRoot;

    while (originalCurrent != null)
    {
        pendingNodes.Push((originalCurrent, newCurrent));
        newCurrent.Left = originalCurrent.Left?.ShallowClone();
        originalCurrent = originalCurrent.Left;
        newCurrent = newCurrent.Left!;
    }

    while (pendingNodes.TryPop(out var next))
    {
        Node? originalRight = next.source.Right;
        Node? newRight = originalRight?.ShallowClone();
        next.target.Right = newRight;

        while (originalRight != null)
        {
            pendingNodes.Push((originalRight, newRight!));
            newRight!.Left = originalRight.Left?.ShallowClone();
            originalRight = originalRight.Left;
            newRight = newRight.Left;
        }
    }

    return newRoot;
}

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@eiriktsarpalis
Copy link
Member

Test failures seem related to #60151.

@eiriktsarpalis eiriktsarpalis merged commit d4e4d33 into dotnet:main Oct 12, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants