Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Adding a GetOrAdd(TKey key, Func<TKey, TValue> valueFactory) to Dictionary<TKey, TValue> #15059

Closed
shmuelie opened this issue Aug 18, 2015 · 35 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@shmuelie
Copy link
Contributor

On ConcurrentDictionary<TKey, TValue> there is the very useful function TValue GetOrAdd(TKey key, Func<TKey, TValie> valueFactory). The pattern that the method represents is one that issued commonly on the "normal" Dictionary<TKey, TValue> as well. I propose adding such a method for this common use case. Besides simplifying it will also make switching between the two dictionaries simpler.

@JonHanna
Copy link
Contributor

Suggested at #14676 within the comment thread too. Possible implementation at https://github.com/hackcraft/coreclr/commit/1a982053ae39ee0e418d370fdee3497ecefd0e5a

@shmuelie
Copy link
Contributor Author

That's what I get for thinking of an idea, never getting around to suggesting it. And then remembering months later and forgetting to check again if it had been suggested. Thanks!

@JonHanna
Copy link
Contributor

Never mind, I do check for duplicates and I still end up asking something that's been asked before, or submitting a PR that does the same as another.

@jnm2
Copy link
Contributor

jnm2 commented May 12, 2017

I assumed this meant Dictionary.GetOrAdd was added, but I'm not finding it or an IDictionary extension method. If they have been skipped, can this issue be reopened?

@svick
Copy link
Contributor

svick commented May 12, 2017

@jnm2 As far as I can tell, @SamuelEnglard closed this issue in favor of https://github.com/dotnet/corefx/issues/1942, which was about TryAdd, but also discussed GetOrAdd. But then https://github.com/dotnet/corefx/issues/1942 was closed after TryAdd was added.

So yes, I think it makes sense to repoen this issue.

@jnm2
Copy link
Contributor

jnm2 commented May 12, 2017

Ugh, I can't believe this missed .NET Standard 2.0. @karelz no exceptions?

@karelz karelz reopened this May 12, 2017
@jnm2
Copy link
Contributor

jnm2 commented May 12, 2017

Specifically:

public static TValue GetOrAdd<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, TKey key, TValue value)
{
    if (dictionary == null) throw new ArgumentNullException(nameof(dictionary));

    if (!dictionary.TryGetValue(key, out var r))
        dictionary.Add(key, r = value);
    return r;
}

public static TValue GetOrAdd<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, TKey key, Func<TKey, TValue> valueFactory)
{
    if (dictionary == null) throw new ArgumentNullException(nameof(dictionary));
    if (valueFactory == null) throw new ArgumentNullException(nameof(valueFactory));

    if (!dictionary.TryGetValue(key, out var r))
        dictionary.Add(key, r = valueFactory.Invoke(key));
    return r;
}

@karelz
Copy link
Member

karelz commented May 12, 2017

This needs API review. I don't want to rush API reviews. That is just begging for troubles later - you miss things and then you live with your mistakes forever.

It is convenience API, we lived without it for quite some time, we can IMO survive one more release. It is not the end of the world after all and not the last .NET Core release. It does not block scenarios.

Probably not the answer you expected, but this is how I feel about it. We should IMO focus primarily on the really impactful things. Makes sense?

@jnm2
Copy link
Contributor

jnm2 commented May 12, 2017

It is indeed the answer I expected. 😄

@jnm2
Copy link
Contributor

jnm2 commented May 12, 2017

(Don't worry about me, I'm not the type that goes around second-guessing the team's decisions. Fyi.)

@karelz
Copy link
Member

karelz commented May 12, 2017

Thanks for understanding!
IMO it is not that much about decisions, but us being open about prioritization. To ship high-quality SW, we gotta prioritize. And we gotta stabilize.

@jamesqo
Copy link
Contributor

jamesqo commented Aug 1, 2017

@jnm2 I don't think it would be a good idea to add GetOrAdd as an extension method on IDictionary-- it would give the impression that only 1 lookup is being made. I think it's best just to make this proposal about Dictionary<K, V>.

@karelz This is not necessarily a convenience API. For Dictionary, this helps avoid duplicate hashtable lookups, and the last time I checked I think I saw around 50 non-test instances of

if (!dict.ContainsKey(key))
{
    dict.Add(key, ...);
}

in corefx.

@jnm2
Copy link
Contributor

jnm2 commented Aug 1, 2017

@jamesqo Yeah, okay. After the BCL has it for a while and we get default interface methods it might make more sense.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@eiriktsarpalis
Copy link
Member

Triage: if the main motivation for this addition is to minimize number of lookups, then presumably adding ref-return methods woulds be the preferable approach? c.f. #22121. Given the lack of recent activity in the issue, I'd be inclined to close this for now.

@jnm2
Copy link
Contributor

jnm2 commented Apr 22, 2020

@eiriktsarpalis Looking at what you linked, ref-returning methods methods aren't happening except maybe in the future on a CollectionsMarshal class (#27062) as an API that has not yet been proposed.

Because of the inherent lack of safety in ref-returning methods due to the backing store changing, I think GetOrAdd is still interesting. I was looking for it just last week.

@eiriktsarpalis
Copy link
Member

@jnm2 just so I understand your requirements, would having the extension method you proposed above solve your issue? IOW is this about double lookups or avoiding creating values when not needed or both?

@jnm2
Copy link
Contributor

jnm2 commented Apr 23, 2020

@eiriktsarpalis It's about double lookups but also with the expectation that creating a value may be expensive and should be avoidable. Two overloads would be nice, one with TValue value and one with Func<TKey, TValue> valueFactory or maybe even Func<TState, TValue> valueFactory.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label May 6, 2020
@danmoseley
Copy link
Member

maybe even Func<TState, TValue> valueFactory.

@jnm2 can you clarify what the "state" would be here?

With respect to ref-returns, this is what DictionarySlim does (the k-Neucleotide benchmark it was used for needs to increment values held in a map). As you point out, that makes it easy to corrupt the backing store. I wonder whether the "update the value" scenario is interesting. If it is, a possible middle ground might be for the valueFactory to accept the current value. Perhaps that's what you mean by TState ?

cc @GrabYourPitchforks since we were talking about it.

@jnm2
Copy link
Contributor

jnm2 commented May 12, 2020

@danmosemsft I was thinking of TState in the same sense as ThreadPool.QueueUserWorkItem. It would be a way to avoid allocating a delegate per call when the key alone does not provide enough information to create the value. However, I don't often have a practical need to avoid allocations like these, so this could wait until someone does have the need.

public static TValue GetOrAdd<TKey, TValue, TState>(
    this IDictionary<TKey, TValue> dictionary,
    TKey key,
    TState state,
    Func<TState, TValue> valueFactory);

@eiriktsarpalis eiriktsarpalis modified the milestones: 5.0.0, 6.0.0 Jun 24, 2020
@eiriktsarpalis eiriktsarpalis changed the title Propose Adding a GetOrAdd(TKey key, Func<TKey, TValie> valueFactory) to Dictionary<TKey, TValue> Propose Adding a GetOrAdd(TKey key, Func<TKey, TValue> valueFactory) to Dictionary<TKey, TValue> Oct 30, 2020
@eiriktsarpalis
Copy link
Member

Carrying over from #36942, we should consider evaluating an AddOrUpdate method as well cc @layomia

@GrabYourPitchforks
Copy link
Member

@eiriktsarpalis This is marked as needs-work, but what is left to address for this? I see the final API as falling under one of two cases:

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd([NotNull] TKey key, Func<TKey, TValue> valueFactory);
}

/* OR */

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd<TState>([NotNull] TKey key, TState state, Func<TKey, TState, TValue> valueFactory);
}

Passing a TState parameter might be more useful for the majority of scenarios (is the key by itself sufficient to create the value?), but introducing a generic method on a generic type could introduce codegen explosion. @jkotas might be able to speak as to whether that's a deal-breaker.

If it is a deal-breaker, we might be able to simplify by saying "state is always provided as object?; no generic state parameter is available."

@jkotas
Copy link
Member

jkotas commented Oct 31, 2020

Yes, I am not trilled about adding more features to Dictionary. Every new feature in Dictionary makes all existing Dictionary uses more expensive given how generics work, especially in AOT scenarios.

I do not see a significant usability advantage of this method over writing a simple straightforward code:

if (!dictionary.TryGetValue(...))
    dictionary.Add(...);

This straightforward code is also going to have better performance in number of cases. It avoids overhead of creating and passing around the lamda or state. I understand that there are situations where it is not the case (e.g. the hash code computation is very expensive or it is rare for TryGetValue to find the element).

I think #27062 is better answer for cases that needs a method like this for performance micro-optimizations.

@GrabYourPitchforks
Copy link
Member

CollectionsMarshal would definitely be more powerful. For example, if we modified the linked proposal to instead read:

public static class CollectionsMarshal
{
    public static ref TValue GetValueOrAddDefault<TKey, TValue>(Dictionary<TKey, TValue> @this, [NotNull] TKey key, out bool exists);
}

Then a "compute the hash code once" code sample would look like:

ref TValue refToMyValue = CollectionsMarshal.GetValueOrAddDefault(dict, "myKey", out bool exists);
if (!exists)
{
    // the dictionary just performed the equivalent of Add("myKey", default(TValue)) - let's overwrite it now
    refToMyValue = CreateValue();
}

As Jan said, this would avoid the generic explosion problem, and it moves the micro-perf API to its own type. The downside is that CollectionsMarshal is by its nature somewhat unsafe. Not in the "violates type safety" sense, but more in the "you're juggling knives and could corrupt the dictionary's internal state" sense.

@svick
Copy link
Contributor

svick commented Oct 31, 2020

@jkotas

I do not see a significant usability advantage of this method over writing a simple straightforward code:

Keep in mind it's not really about simplifying from two lines of code to one, it's about changing this code:

Foo GetFoo(int id)
{
    if (!dictionary.TryGetValue(id, out var foo))
    {
        foo = CreateFoo(id);
        dictionary.Add(id, foo);
    }
    return foo;
}

to this code:

Foo GetFoo(int id) => dictionary.GetOrAdd(id, CreateFoo);

The latter seems much easier to understand to me.

@GrabYourPitchforks
Copy link
Member

As mentioned, one of the problems with providing the factory is "is the key by itself sufficient to create the value, or does creation of the value rely on more information than just the key?" If the latter (which I believe is common), you really need to provide a state object.

@alrz
Copy link
Member

alrz commented Oct 31, 2020

you really need to provide a state object.

And that easily gets convoluted:

    if (!dictionary.TryGetValue(id, out var foo))
        dictionary.Add(id, foo = CreateFoo(id));
    Use(foo);
    var foo = dictionary.GetOrAdd(id, static t => t.self.CreateFoo(t.id), (self: this, id))
    Use(foo);

I'd personally prefer the first one.

@FiniteReality
Copy link

    var foo = dictionary.GetOrAdd(id, static t => t.self.CreateFoo(t.id), (self: this, id))
    Use(foo);

Wouldn't passing the state as a second parameter be way better, like ConcurrentDictionary?

var dict = new ConcurrentDictionary<ulong, Client>();

// ...

var clientSocket = await serverSocket.AcceptAsync();
var client = dict.GetOrAdd(clientId, CreateClient, socket);
HandleClient(client);

// ...

static Client CreateClient(ulong clientId, Socket clientSocket)
{
    return new Client(clientSocket);
}

@GrabYourPitchforks
Copy link
Member

@FiniteReality We need to be careful about adding generic instance methods to generic types. There's a real risk that it could cause significant footprint or memory bloat, even for people who don't use the new feature. Generally we want new features to be pay-for-play and to incur no overhead if you don't use them.

@FiniteReality
Copy link

@GrabYourPitchforks Understandable, but doesn't that somewhat fly in the face of being able to supply a state object? From what I can tell, the API @alrz suggested would have had the same issues, but would have been even more awkward to interact with.

@GrabYourPitchforks
Copy link
Member

Yes, it does fly in the face of being able to provide a state object, unless the parameter is typed as object? or some similar non-generic type.

@FiniteReality
Copy link

typed as object? or some similar non-generic type.

Oh joy! That's one of the things I hate most about using System.Threading.Timer 😅

@alrz
Copy link
Member

alrz commented Nov 1, 2020

There's a real risk that it could cause significant footprint or memory bloat, even for people who don't use the new feature.

I don't understand this claim. Could you elaborate?

@eiriktsarpalis
Copy link
Member

Agreed that a CollectionsMarshall approach is the preferred solution here. It would let users author their own variants of GetOrAdd as extension methods, without having to do double lookups.

@layomia
Copy link
Contributor

layomia commented Feb 11, 2021

I'm closing this issue in favor of #27062 which offers a more capable solution. See that issue for more details.

@layomia layomia closed this as completed Feb 11, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 14, 2021
@eiriktsarpalis
Copy link
Member

FWIW here's how one can author the originally proposed method as an extension method in .NET 6:

public static class CollectionExtensions
{
    public static TValue GetOrAdd<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, TKey key, Func<TKey, TValue> valueFactory)
        where TKey : notnull
    {
        if (dictionary == null)
        {
            throw new ArgumentNullException(nameof(dictionary));
        }

        if (valueFactory == null)
        {
            throw new ArgumentNullException(nameof(valueFactory));
        }

        ref TValue? value = ref CollectionsMarshal.GetValueRefOrAddDefault(dictionary, key, out bool exists);
        if (!exists)
        {
            value = valueFactory(key);
        }

        return value!;
    }
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests