Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers can use CollectionsMarshal ref accessors for Dictionary<TKey, TValue> #27062

Closed
Tracked by #44314
benaadams opened this issue Aug 4, 2018 · 57 comments · Fixed by #54611
Closed
Tracked by #44314

Developers can use CollectionsMarshal ref accessors for Dictionary<TKey, TValue> #27062

benaadams opened this issue Aug 4, 2018 · 57 comments · Fixed by #54611
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections Bottom Up Work Not part of a theme, epic, or user story Cost:S Work that requires one engineer up to 1 week in-pr There is an active PR which will close this issue when it is merged Priority:3 Work that is nice to have Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@benaadams
Copy link
Member

benaadams commented Aug 4, 2018

Edited by @layomia. Original proposal by @benaadams (click to view) They aren't "safe" operations as the backing store can change if items are added, removed, etc and the changes aren't tracked

namespace System.Runtime.InteropServices
{
    public partial static class CollectionsMarshal
    {
        public ref TValue ValueRef<TKey, TValue>(Dictionary<TKey, TValue> dict, TKey key);
    }
}

However it is useful to get a reference to modify struct TValue entries without a full copy and replace update.

/cc @jkotas

/cc @stephentoub is it valid for ConcurrentDictionary?


CollectionsMarshal ref accessors for Dictionary<TKey, TValue>

Attempting to obtain a value in a dictionary, and adding it if not present, is a common scenario in .NET applications. The existing mechanism for achieving this today is by using Dictionary<TKey, TValue>.TryGetValue(Tkey key, out TValue value), and adding the value to the dictionary if not present. This causes duplicate hash lookups:

if (!dictionary.TryGetValue(key, out MyType myValue)) // hash lookup
{
    myValue = CreateValue();
    dictionary.Add(key, myValue); // duplicate hash lookup
}

Another scenario is updating struct values in dictionaries. The existing pattern for achieving this causes struct copies and duplicate hash lookups, which potentially have non-trivial performance costs for large structs:

struct LargeStruct
{
    // Other members...
    
    public int MyInt { get; set; }

    // Other members...
}

LargeStruct myValue = dictionary[key]; // hash lookup, struct copy
myValue.MyInt++;
dictionary[key] = myValue; // another hash lookup, another struct copy

Motivation

  • Provide a mechanism which avoids duplicate lookups when obtaining dictionary values which may not be present.
  • Provide a mechanism which avoids copies and duplicate lookups when mutating struct dictionary values.

API proposal

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or throws a KeyNotFoundException
        /// </summary>
        public static ref TValue GetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key);

        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or returns Unsafe.NullRef<TValue>
        /// </summary>
        public static ref TValue TryGetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key, out bool exists);

        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or inserts it with value default(TValue).
        /// </summary>
        public static ref TValue GetValueRefOrAddDefault<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key, out bool exists);
    }
}

CollectionsMarshal is an unsafe class that provides a set of methods to access the underlying data representations of collections.

API usages

Updating struct value in dictionary: KeyNotFoundException thrown when key not present in dictionary

This pattern is helpful when caller wants to optimally update a struct in a scenario where the key being absent is an error state. Creating a value and adding it to the dictionary, if not already present, is not desired.

try
{
    ref MyType value = CollectionsMarshal.GetValueRef(dictionary, key);
    value.MyInt++;
}
catch (KeyNotFoundException exception)
{
    // Handle exception
}

Unsafe.NullRef<TValue>() returned when key not present in dictionary

This pattern satisfies both the optimal struct value update and optimal "get or add" value scenarios.

ref MyType value = CollectionsMarshal.TryGetValueRef(dictionary, key, out bool exists);

if (exists)
{
    value.MyInt++;
}
else
{
    ref value = new MyType() { MyInt = 1 };
}

default(TValue) returned when key not present in dictionary

This pattern also satisfies both the optimal struct value update and optimal "get or add" value scenarios. A struct default value always being instantiated may cause the TryGetValueRef to be preferred, depending on the perf scenario.

ref MyType value = CollectionsMarshal.GetValueRefOrDefault(dictionary, key, out bool exists);
value.MyInt++;

if (exists)
{
    // Do something if I care that the key already existed.
}

Alternative design

GetOrAdd methods, similar to those on ConcurrentDictionary<TKey, TValue>, were proposed in #15059.

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd([NotNull] TKey key, Func<TKey, TValue> valueFactory);
}

/* OR */

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd<TState>([NotNull] TKey key, TState state, Func<TKey, TState, TValue> valueFactory);
}

Upsides

  • More discoverable/friendly than the relatively advanced APIs in this proposal.
  • Follows precendent in the BCL i.e. ConcurrentDictionary<TKey, TValue>.

Downsides

  • Not pay-for-play. Since the new methods would live in System.Private.CoreLib, generic expansion issues due to struct-based generics will affect all users of Dictionary<TKey, TValue>, not just those that use them.
    • An argument against this point is that the methods can live in an extensions class, but that won't provide any perf benefits since they won't be able to access non-public members.
  • Doesn't address the issue of copying when mutating large struct values.

Open questions

Should the out bool exists parameter on TryGetValueRef be removed?

Since a call to Unsafe.IsNullRef<T>(ref value) can indicate whether the value exists in the dictionary, the second method could simply be:

public static class CollectionsMarshal
{
    /// <summary>
    ///   Gets a reference to the value associated with the specified key, or returns Unsafe.NullRef<TValue>
    /// </summary>
    public static ref TValue TryGetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key);
}

Usage

ref MyType value = CollectionsMarshal.TryGetValueRef(dictionary, key);

if (!Unsafe.IsNullRef(ref value))
{
    value.MyInt++;
}
else
{
    ref value = new MyType() { MyInt = 1 };
}

Any concerns about API bloat in the CollectionsMarshal type?

The generic expansion highlighted in the GetOrAdd-based alternative doesn't apply much here since the new methods will live in CollectionsMarshal and will be pay-for-play. However, are there concerns about bloating the type?

One answer here states as follows:

I'd think to the same extent as any other/similar type such as MemoryMarshal, Unsafe, etc.

As it is today, we have 1 method in it and this proposal is adding 2-3 more. We only have a limited number of collection types in S.P.Corelib and a more limited set where returning a ref makes sense, so I don't think we are at risk of overloading it.

@benaadams benaadams changed the title Add MemoryMarshal ref accessors for Dict/List [Api Proposal] Add MemoryMarshal ref accessors for Dict/List Aug 4, 2018
@jkotas
Copy link
Member

jkotas commented Aug 4, 2018

@benaadams benaadams changed the title [Api Proposal] Add MemoryMarshal ref accessors for Dict/List [Api Proposal] Add CollectionsMarshal ref accessors for Dict/List Aug 4, 2018
@benaadams
Copy link
Member Author

Updated

@vcsjones
Copy link
Member

vcsjones commented Aug 7, 2018

@benaadams

public ref TValue ValueRef<TKey, TValue>(Dictionary<TKey, TValue> TKey key);

Did you mean to give the dictionary a parameter name and have two different parameters? Like

public ref TValue ValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, TKey key);

@safern
Copy link
Member

safern commented Mar 12, 2019

@benaadams does it make sense to create 1 issue for: #27061 and this guy, to propose adding CollectionsMarshal class and having some rationale and usage, to bring it over to API review?

Marking as 3.0 for now to not loose attention and then we can decide if we bring them individually or together as 1 issue to API Review.

@danmoseley
Copy link
Member

Not required for 3.0 release

@terrajobst
Copy link
Member

@benaadams same question as on the other issue for CollectionMarshal: what are compelling scenarios for these APIs?

@benaadams
Copy link
Member Author

benaadams commented Aug 27, 2019

struct LargeStruct
{
   // ...
   int Value;
}

1 hash lookup, no copies to update a struct

dict.ValueRef(key).Value++; // hash lookup

vs

2 hash lookup, 2 copies to update a struct

var s = dict[key]; // hash lookup, struct copy
s.Value++;
dict[key] = s; // hash lookup, struct copy

@terrajobst
Copy link
Member

So if we were to expose an API that would allow you to cache the hash key, would this address your need? IOW, between struct copies and hash lookups, what hurts more?

@benaadams
Copy link
Member Author

benaadams commented Aug 29, 2019

The hashing isn't particularly expensive (e.g. if its an int key there is no cost); its then taking the mod of that and traversing the arrays while doing the lookup (i.e. the search).

The struct copies are expensive as if you want to modify one property of the struct its disproportionately expensive.

I do accept its inherently unsafe if you mix it with other operations on the list or dictionary; and then try to keep using the ref, so it would be nice to add something like a [CallerMustBeUnsafe] attribute dotnet/roslyn#8663 so it can only be done in an unsafe block

@terrajobst
Copy link
Member

terrajobst commented Aug 30, 2019

The hashing isn't particularly expensive (e.g. if its an int key there is no cost); its then taking the mod of that and traversing the arrays while doing the lookup (i.e. the search).

The struct copies are expensive as if you want to modify one property of the struct its disproportionately expensive.

That makes sense.

it would be nice to add something like a [CallerMustBeUnsafe] attribute

Other people might feel differently but the whole point is that this should be a good enough speed bump. Putting this on a type in System.Runtime.InteropServices that isn't an extension method is sufficient to me. We could prefix the APIs with Unsafe too, but I'm not sure we're helping much with this.

The larger question is: where do we draw the line with escape hatches like this? In a sense, we're chasing the 1% cases here, but I'm a bit scared that we're creating a mess the more APIs like this we're exposing throughout the BCL. And I don't have a good handle on this where we hold the bar.

Do you have any real workload where this would show up? Of course, one can always fabricate a micro benchmark; I don't doubt there are cases where this API would speed things up. The point is: how much does this matter in real code?

@benaadams
Copy link
Member Author

benaadams commented Aug 31, 2019

Looking at these apis more holistically:

For the dictionary you can achieve a similar; but safer, effect by wrapping the struct in a class then using the class handle. Its safer because the reference will remain valid across dictionary operations; and then a "live" reference can be more easily passed around and stored.

For the list List<T> AsSpan https://github.com/dotnet/corefx/issues/31597 would enable the same ability (working from the returned Span<T> which returns ref); but also be a more useful api.

For the dictionary using a class indirection doesn't cause too much issue because the data layout isn't important to the calling code (as how its internally stored in dictionary can't be consumed).

For list; that its a straight in-order array is useful, as then a consumer if they had access to it, can do things like vectorize the processing of the data, which a class indirection would break.

So in preference I'd drop these apis and add the List<T> AsSpan https://github.com/dotnet/corefx/issues/31597 one as it enables the same scenario as this api, plus other opportunities like vectoization (which this doesn't).

e.g. using the other api to achieve the same

AsSpan(list)[index].Value++; // though likely would cache the span if it was in a loop

vs

ItemRef(list, index).Value++

@tannergooding
Copy link
Member

@benaadams, could you update this proposal to just be for Dictionary?

Seeing as we already reviewed/approved the List<T> changes here: https://github.com/dotnet/corefx/issues/31597 (and implemented https://source.dot.net/#System.Private.CoreLib/CollectionsMarshal.cs,18)

@benaadams benaadams changed the title [Api Proposal] Add CollectionsMarshal ref accessors for Dict/List [Api Proposal] Add CollectionsMarshal ref accessors for Dict Jan 16, 2020
@benaadams
Copy link
Member Author

Done

@jkotas
Copy link
Member

jkotas commented Jan 16, 2020

Also update the proposed API list in the top post?

@benaadams
Copy link
Member Author

Dropped List and ConcurrentDictionary (as getting a live ref for ConcurrentDictionary is probably asking for trouble as other threads could invalid the ref)

@GrabYourPitchforks
Copy link
Member

If we approve this API we'd also need to approve Unsafe.IsNullRef at the same time.

@GrabYourPitchforks GrabYourPitchforks added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 20, 2020
@GrabYourPitchforks
Copy link
Member

@benaadams What's the expected behavior when the key is not found? Return a null ref, throw an exception, insert default(T) into the collection, or something else? Would a caller need to distinguish between "not found" and "I just created a dummy placeholder for you" scenarios?

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020
@MaximGurschi
Copy link

How about a method similar to the specialized SpanAction string allocator?

Dictionary.AddOrUpdate(TKey key,
TState state,
SpanAction<TValue, TState, bool> factory);

TState is provided to help in no allocation scenarios.
The factory accepts a Span of size 1 item, the state and a bool saying if the key is found or not.
Both found/not found cases can benefit from being implemented as SpanAction in case the value type TValue has mutable fields itself - you can then update only the fields you have to without copying the whole struct.

Then use it like:

var dict = new Dictionary<long, long>();

dict.AddOrUpdate(56, state, (value, s, b) => b ? ++value[0] : value[0] = state.Value);

@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 6.0.0 Feb 25, 2021
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 9, 2021
@terrajobst
Copy link
Member

terrajobst commented Mar 9, 2021

Video

  • Makes sense, we should not use the Try-prefix because it refers to the try-pattern which would return a bool and out the value
namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {        
        public static ref TValue GetValueRefOrNullRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key);
        public static ref TValue GetValueRefOrAddDefault<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key, out bool exists);
    }
}

EDIT by @stephentoub: Removed GetValueRef method after subsequent discussion.

@eiriktsarpalis eiriktsarpalis changed the title [Api Proposal] Add CollectionsMarshal ref accessors for Dict Developers can use CollectionsMarshal ref accessors for Dictionary<TKey, TValue> Mar 9, 2021
@eiriktsarpalis eiriktsarpalis added User Story A single user-facing feature. Can be grouped under an epic. Bottom Up Work Not part of a theme, epic, or user story Cost:S Work that requires one engineer up to 1 week Priority:3 Work that is nice to have Team:Libraries labels Mar 9, 2021
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Mar 9, 2021
@Sergio0694
Copy link
Contributor

Hey @benaadams, are you still currently working on this? I saw you've contributed #49388 which has already been merged, was wondering whether you also planned to do the other two or wanted some help in case you were busy? I have some spare time on my end so I'd be happy to help contribute with at least an implementation of GetValueRefOrAddDefault, which I think is the most important one among the remaining two (also to finally get parity on that front with DictionarySlim<TKey, TValue>.GetOrAddValueRef). We still have some time before the deadline on July 13th, so let me know! 😄

NOTE: this issue doesn't seem to be marked up for grabs, but given that Ben already contributed with one of the three, and the whole proposal is bottom up work anyway, I'm assuming that's just implied here?

@benaadams
Copy link
Member Author

@Sergio0694 feel free; don't really have the time to devote to doing the benchmarking to ensure its a good change

We still have some time before the deadline on July 13th, so let me know!

What's this deadline?

@Sergio0694
Copy link
Contributor

"feel free; don't really have the time to devote to doing the benchmarking to ensure its a good change"

Alright, no worries! I'll give that one a shot myself then 😄

"What's this deadline?"

That's the deadline for all feature work for .NET 6, at which point they'll take a snapshot of main for the final release, and then just cherry pick bug fixes and whatnot after that point, but no new features. It's all documented in the tracking issue: #54569.

@danmoseley
Copy link
Member

To clarify, July 13 is our feature complete date, when we switch over entirely to bugs, Aug 17 (when we aim to get to zero bugs for 6.0) is when we branch 6.0 from main.

As last year however, we would like to continue to merge community PRs after July 13, unless they'll need more stabilizing time in which case we might pause merging until after the branching.

@eiriktsarpalis eiriktsarpalis linked a pull request Jul 14, 2021 that will close this issue
@eiriktsarpalis eiriktsarpalis added the in-pr There is an active PR which will close this issue when it is merged label Jul 14, 2021
@eiriktsarpalis
Copy link
Member

We should definitely add a paragraph for this feature in the preview 7 article.

@jkotas
Copy link
Member

jkotas commented Jul 15, 2021

This is niche unsafe API that 99% of .NET developers should not ever use. We do not want to encourage people to use it just because of they can.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections Bottom Up Work Not part of a theme, epic, or user story Cost:S Work that requires one engineer up to 1 week in-pr There is an active PR which will close this issue when it is merged Priority:3 Work that is nice to have Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

Successfully merging a pull request may close this issue.