-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Allow more than 2^32 entries #16
Comments
Is 2^32 entries really not enough? For |
I agree that it is enough for most applications. An example for a problematic use-case would be storing an index for nodes in Wikidata. Wikidata contains more than 2^32 distinct nodes. So, storing a mapping from |
We can't win on all counts. Extreme cases should not decrease the performance of the common case. |
Right, but I still think it would be nice to at least optionally allow such extreme use cases. @bigerl by the way, only allowing 3 bits for |
I have a version that should work for you: https://github.com/martinus/unordered_dense/blob/2022-08-custom-bucket-types/include/ankerl/unordered_dense.h the bucket's type can now be customized. E.g. this is how you can use the big bucket type: using MapBig = ankerl::unordered_dense::map<std::string,
size_t,
ankerl::unordered_dense::hash<std::string>,
std::equal_to<std::string>,
std::allocator<std::pair<std::string, size_t>>,
ankerl::unordered_dense::bucket_type::big>; Bucket size will be 12 byte, and max_size is 2^64-1. Would this work for you? |
That works. Thanks for the fast solution. I like the idea of making it configurable. Just a final note: I would expect the big variant to perform better with clang than with gcc. I had in the past several cases where clang handled structs with sizes that are not multitudes of a machine word better. |
Thank you for sharing this hash-map implementation. It's a joy to read through the code.
Is your feature request related to a problem? Please describe.
Currently, because of
Bucket.value_idx
being auint32_t
the map/set can store at max 2^32 entries. That is quite tight. Many applications will hit that limit. In java, where collections have a similar limitation, I run into it regularly.Describe the solution you'd like
The readme says that only 1 Byte + 3 Bits of
Bucket.dist_and_fingerprint
are payload. If the remaining 21 Bits would be used to extendBucket.value_idx
, much more entries could be stored (up to 2^(32+21) ~ 64 PB of uint64_t). I would suggest something like:Describe alternatives you've considered
Instead of non-standard attribute packing, standard bit masks and shifts can be used.
The text was updated successfully, but these errors were encountered: