Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to hashbrown::HashTable #21

Merged
merged 1 commit into from
Jan 9, 2024
Merged

Conversation

cuviper
Copy link
Contributor

@cuviper cuviper commented Jan 7, 2024

HashTable is basically a safer version of hashbrown::RawTable. It leaves all of the hashing to be done externally, which means we don't need a NullHasher anymore. Code like LinkedHashMap::shrink_to_fit gets a lot simpler when it can just provide a hashing callback.

There's one public API change here, that LinkedHashMap::reserve and try_reserve are now constrained such that K can be (re)hashed with S, which is needed when hashbrown tries to move them to a new allocation. However, I think it was a bug that this was not required before, because it was previously possible to trigger NullHasher's assertion when resizing -- the new test_reserve demonstrates a case that failed. Also, LinkedHashSet already has these constraints on its reserve methods.

@cuviper
Copy link
Contributor Author

cuviper commented Jan 7, 2024

This would close the door on #13 though -- it's understandable if you don't want to be tied more closely to hashbrown.

@kyren
Copy link
Owner

kyren commented Jan 8, 2024

However, I think it was a bug that this was not required before, because it was previously possible to trigger NullHasher's assertion when resizing -- the new test_reserve demonstrates a case that failed.

Yeah, this is embarrassing and definitely wrong, thank you for catching it! I promise I'll merge either #21 or #22 ASAP and make a new release. This has been broken since very early in this crate's life and I think I just nobody has even tried to use it yet 😔.

It is true that I think it would be nice if this crate could not depend on hashbrown, but not because hashbrown is not ofc extremely high quality, but just because it makes sense for it to have absolutely minimal dependencies. HOWEVER, this PR seems simpler (and thus safer) than #22, and since this crate, at least for the moment, has to depend on hashbrown anyway due to the lack of raw entry api, I think this is probably the PR to go with. It was extremely courteous of you to give me two PRs to choose from 😅, I'm sorry you had to do both of them before I responded!

I don't think this completely closes the door on #13 ofc, I would still like that someday... it just doesn't seem like a very good idea to attempt to do that yet because the raw entry API is so useful when using this crate to build LRU caches, which is sort of the thing which I imagine this crate is mostly useful for. Moving off of hashbrown would depend on what the final API in the stdlib looks like... I guess if it was right around the corner or something I might feel differently and merge #22, but it might be that that would take major changes anyway to match what the final stdlib API will look like, so I don't actually feel that bad about this PR.

Does that reasoning make sense to you? If you feel differently please let me know, I'm willing to defer to your (probably better) judgement.

@cuviper
Copy link
Contributor Author

cuviper commented Jan 8, 2024

I'm sorry you had to do both of them before I responded!

No worries! That was not impatience, and I'm sorry if I made you feel pressured -- it just piqued my interest enough that it was fun to try both approaches. I also maintain indexmap, which is in a similar "weird HashMap" position, so I like to explore the possibilities.

Does that reasoning make sense to you? If you feel differently please let me know, I'm willing to defer to your (probably better) judgement.

It does make sense, but ultimately I think my judgement is not as important as which one you feel more comfortable maintaining. FWIW, there's also a third option where reserve is fixed for rehashing without as much structural change, but I do think HashTable works pretty well as an expression of intent. It's up to you!

@kyren
Copy link
Owner

kyren commented Jan 8, 2024

It does make sense, but ultimately I think my judgement is not as important as which one you feel more comfortable maintaining. FWIW, there's also a third option where reserve is fixed for rehashing without as much structural change, but I do think HashTable works pretty well as an expression of intent. It's up to you!

I mean honestly I like the HashTable approach too, I'm glad the API exists in hashbrown, so it was a biased choice anyway 😛.

I think everything in the PR looks good, and some of the drive by changes are signs that I really should be running clippy more often. I'll merge this in just a bit and make a new release, thank you! ❤️

@kyren kyren merged commit db2d230 into kyren:master Jan 9, 2024
1 check passed
.insert_with_hasher(hash, new_node, (), move |k| hasher((*k).as_ref().key_ref()))
.0;
.into_table()
.insert_unique(hash, new_node, move |k| hasher((*k).as_ref().key_ref()))
Copy link
Owner

@kyren kyren Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I missed when reviewing this is the behavior of RawVacantEntryMut, which has subtly changed (but I think it is still acceptable behavior).

After this PR, if you produce a RawVacantEntryMut for a specific key, then insert a totally unrelated key using the RawVacantEntryMut you receive, you can end up with duplicate keys (which is not a safety issue, only a correctness one). This seems to match the same behavior in hashbrown, so this is fine, but I should have brought this up during review to ask to make sure I understood everything correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what subtle change you mean. Yes, you can do incorrect things here, but nothing new -- AFAICT it ends up doing the exact same insertion on the inner RawTable, before and after my change. Am I missing something?

Before: RawVacantEntryMut::insert_with_hasher -> RawTable::insert_entry -> RawTable::insert
After: HashTable::insert_unique -> RawTable::insert

Copy link
Owner

@kyren kyren Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no you're exactly right, it's exactly the same as it was before, I just didn't realize what the previous behavior was.

Of course it would be the same, because it used hashbrown's RawEntryMut API before too.

Sorry... it's been a long month lol.

@kyren kyren mentioned this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants