Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special Ordered Dict that automtically swaps from LittleDict to another type when too full #57

Open
oxinabox opened this issue Sep 29, 2020 · 2 comments

Comments

@oxinabox
Copy link
Member

oxinabox commented Sep 29, 2020

Originally posted by @PallHaraldsson in #54 (comment)

Since LittleDict is faster, is it a reasonable default?

"It is reasonable to expect it to outperform an OrderedDict,
with up to around 30 elements in general;
or with up to around 50 elements if using a LittleDict backed by Tuples"

[I proposed OrderedDict as replacement for Dict at julialang.] I mean LittleDict is maybe not a good default as you can have more elements (how likely for Julia itself?). For a lot of code you will not have more, while the possibility should give you pause. Does it make sense to build a new Dict on this one for the first 30-50 elements, and have a fallback to another, OrderedDict (or some newer Swiss or OrderedRobinDict?).

@oxinabox
Copy link
Member Author

oxinabox commented Sep 29, 2020

Yes, I think that would be neat. MagicOrderedDict

The exact time to swap depends a lot on the type of data and how expensive it's hash function is,
but swapping over at about 30 should be generally safe. (which is why the docs recommend it).

In theory we could even be more magic by timing hash and isequal on the first 2 elements added then decide what to do for all others.

@PallHaraldsson
Copy link
Contributor

Thinking more about it, and what I had in mind with fallback, is not move stuff to another Dict but leave stuff unchanged in the LittleDict that handles say first 30, then add rest in an "overflow-dict". When you lookup, if at least count is twice the former size (maybe somewhat more as the latter is slower), 60+ then you look first in the LittleDict. If count is higher than the threashold (e.g. huge), then you would look first in the latter.

I'm not sure it's practical to look in both "sub"-dicts at the same time (threads have some overhead).

[I was also thinking, maybe scan the LittleDict from the front and back (similar to quicksort), that can be done in one loop. Maybe it helps a bit, maybe you're more likely to look up recent additions. Maybe just scan backwards?]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants