Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref-return Dictionary<TKey, TValue> and friends #22121

Closed
damageboy opened this issue Jun 5, 2017 · 17 comments
Closed

ref-return Dictionary<TKey, TValue> and friends #22121

damageboy opened this issue Jun 5, 2017 · 17 comments
Labels
area-System.Collections question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@damageboy
Copy link
Contributor

Hi all,
I recently (well a few months ago) became pissed at seeing c# reach position 12 at this quite generic yet telling benchmark:
https://benchmarksgame.alioth.debian.org/u64q/knucleotide.html

This competition requires the use of library/language provided hash tables, which is why I can't submit my version:
https://github.com/damageboy/shame

Which runs at about half the time of the current top c# version, placing it somewhere near position 3-5 (I don't know since I can't submit it), I can only deduce it from the improvement observed on my machine...

The gains are, for the most part, attributed to my hacked up version of SuperDictionary.cs which is essentially the plain old c# dictionary.cs coerced into ref-return semantics.

Now, admittedly, putting SuperDictionary.cs in the corefx repo to win a benhmark is probably the worst thing ever, as far as suggestions go, (although not as far as winning benchmarks go, that would totally own it... 🥇 ) but I would like to understand where c# / corefx is going forward with faster data structures that are built to make use of ref-return.

Are there any plans to get these sort of fast data structures into netstandard / corefx / whatever, or does the official party line state the we all need to implement our half assed versions of exiting tested data structures riddled with our own bugs?

@mikedn
Copy link
Contributor

mikedn commented Jun 5, 2017

faster data structures that are built to make use of ref-return

General purpose data structures that make use of ref returns are a questionable proposition. While they would be useful in some cases they also open the door for some rather nasty bugs and as such they need to be used with some care.

This competition requires the use of library/language provided hash tables, which is why I can't submit my version:

I'm not sure I understand what that means but I see that other implementations make use of data structures that are part of the "standard" language library. The top versions using C, C++ and Java do just that.

Are there any plans to get these sort of fast data structures into netstandard / corefx / whatever, or does the official party line state the we all need to implement our half assed versions of exiting tested data structures riddled with our own bugs?

Why do they have to be riddled with our own bugs? Are you saying that the C# community isn't capable of writing good libraries? If that's the case then adding more stuff to corefx won't do any good in the long term. These days languages, frameworks and runtimes survive if the community can help move things forward. If in the case of .NET it turns out that this isn't the case then it's all kind of pointless...

@karelz
Copy link
Member

karelz commented Jun 7, 2017

On related note: Did you know about PowerCollections? It is something we are thinking about to put into a new repo (e.g. CoreFXExtensions). No decision made yet, no timeline though. The idea is flying around since December ... still needs more thought. Maybe by end of summer I will get to it ...

@benaadams
Copy link
Member

from Proposal: ref returns API for Dictionary dotnet/corefx#21240

Dictionary<TKey, TVal>
{
    public ref TValue GetValueByRef(TKey key) // throws if key not found
    private static TValue _DefaultVal; // needed for TryGetValueByRef as default return
    public ref TValue TryGetValueByRef(TKey key, out bool found) 
    public ref TValue TryGetValueByRefOrAdd(TKey key, out bool found, TValue setVal)
    public bool TryGetValueOrAdd(TKey key, out TValue value, TValue setValue) 
}

@sharwell
Copy link
Member

sharwell commented Aug 7, 2017

@damageboy Have you tied with ConcurrentDictionary<TKey, TValue>? It provides an AddOrUpdate method which I believe is what you need for this.

@damageboy
Copy link
Contributor Author

@sharwell I'm not looking for the API.
I'm looking for ref semantics in order to achieve a 2x latency improvement due to better usage of cache and struct -> register promotion

@mikedn
Copy link
Contributor

mikedn commented Aug 14, 2017

due to better usage of cache and struct -> register promotion

Huh?!

@sharwell
Copy link
Member

sharwell commented Aug 14, 2017

@damageboy You made a claim that ref returns for Dictionary<TKey, TValue> would provide a specific amount of benefit for a specific application. In a context like this, it would be good to further understand the benefits this change would have relative to the current behavior of the other standard dictionary implementation in .NET, ConcurrentDictionary<TKey, TValue>.

@mikedn
Copy link
Contributor

mikedn commented Aug 14, 2017

@sharwell The benefit is that having something like GetValueByRef allows you to avoid a second hashtable lookup when updating the value of a key:

map[key] = map[key] + 1;
// versus
map.GetValueByRef(key)++;

Of course, it can also avoid value copying. But that has nothing to do with the knucleotide benchmark.

It has nothing to do with cache usage and struct register promotion.

@benaadams
Copy link
Member

ConcurrentDictionary is still pass by copy for struct; so if the structs are large it isn't great (vs access in place with ref return) - plus the overhead of coping with concurrency.

Also will closure capture on update for the method

AddOrUpdate(TKey, TValue,Func<TKey, TValue,TValue>)

And always closure capture for (if their is any outside state to use)

AddOrUpdate(TKey,Func<TKey,TValue>,Func<TKey, TValue,TValue>)

Needs the overloads

AddOrUpdate(TKey key, TValue val, Func<TKey, TValue old, TValue new, TValue updated>)
AddOrUpdate(TKey key, TValue val, Func<TKey, TValue, TValue>, Func<TKey, TValue, TValue, TValue>)

Though with ref locals could also do some interesting overloads

@sharwell
Copy link
Member

I agree that by-ref access to the values stored for keys in a dictionary have benefits. I asked mostly because the context of this post refers to a specific application with specific performance numbers (current and goals).

@benaadams Yes, lack of a context parameter for the add and update delegates has been frustrating. However, It appears that the specific application here uses transformation functions that are functions of the current value only, which fits into the current optimum scenario given the limitations.

@stephentoub
Copy link
Member

Yes, lack of a context parameter for the add and update delegates has been frustrating

dotnet/corefx@c8f90c6

@damageboy
Copy link
Contributor Author

@mikedn sorry about the struct promotion comment, I had a possible brain fart, I was mixing up with a different issue I'm dealing with right now, it was totally unrelated. I did experience a latency improvement due to not having to do two lookups, sorry about it...

The knucleotide benchmark was really not the only reason I want this sort of optimized dictionary.

I'm using this dictionary in a very specialized use case (receiving and parsing packets with single digit microsecond latency).

I have multiple dictionaries to keep updated, and switching the dictionaries to ref return (a.k.a my bastardized SuperDictionary) saved my a great deal of that latency when compared to the normal Dictionary.

The knucleotide benchmark is much more extreme, granted, in the sense that updating the dictionary is basically all that piece of code does....

In my very specific application, I have a set of complex APIs that I provide that all have to do with maintaining those dictionaries, where the performance improvement by switching to SuperDictionary is around 30%-50% (some operations have double the dictionary updates)

@mikedn
Copy link
Contributor

mikedn commented Aug 14, 2017

I have multiple dictionaries to keep updated, and switching the dictionaries to ref return (a.k.a my bastardized SuperDictionary) saved my a great deal of that latency when compared to the normal Dictionary.

Due to multiple lookup avoidance or perhaps due to struct copy avoidance? I'd guess that it's the former, structs would have to be quite large for copies to make a difference.

@benaadams
Copy link
Member

c8f90c6

@stephentoub its a 2.0 feature; but you timed the comment to perfectly coincide with the release of .NET Core 2.0 😄

@damageboy
Copy link
Contributor Author

@mikedn yes, in my case my structs are two-uint32 sized

@damageboy
Copy link
Contributor Author

I've caught up with the various PRs and attempts (#17405), mainly the comments made by @benaadams
@jkotas 👍
dotnet/coreclr#17405 (comment)
dotnet/coreclr#17405 (comment)

I'm closing this issue as it's now clear to me that there's too much potential breakage the comes with the territory when adding such a behavior to the existing Dictionary, and will probably maintain my own sorry hacked up copy of Dictionary.cs

@svick
Copy link
Contributor

svick commented Jun 14, 2020

To anyone who finds this issue now, adding ref return accessor to Dictionary has been reopened at #27062.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

8 participants