Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reporting a capacity of zero when there is still space in the map. #436

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Diggsey
Copy link

@Diggsey Diggsey commented Jun 8, 2023

Fixes rust-lang/rust#79178

There are two changes in this PR:

  1. Add a max_capacity() method. The capacity() method provides a lower bound on the capacity, this complements it by providing an upper bound. There are some use-cases (such as deciding when to call shrink_to_fit()) when using the upper bound on the capacity may be more appropriate.

  2. Recover "lost" capacity when the map is fully emptied if there are a large number of tombstones. Currently this happens when the reported capacity would be less than or equal to half the actual capacity of the map. The most important case is when the reported capacity would be zero, but we haven't deallocated the backing memory: with this change a positive capacity will be reported.

@Amanieu
Copy link
Member

Amanieu commented Jun 9, 2023

I don't mind adding a max_capacity method, but I am against adding the extra code in erase. My main concern is that this increases code size for a rare condition, and that tombstones will naturally be cleared as more items are inserted in the map.

Have you considered using shrink_to instead of shrink_to_fit to limit the size of a table to a given capacity? This doesn't rely on growth_left for calculations and is therefore independent of the number of tombstones in the table.

@Diggsey
Copy link
Author

Diggsey commented Jun 10, 2023

My main concern is that this increases code size for a rare condition, and that tombstones will naturally be cleared as more items are inserted in the map.

This is intended to restore the invariant that (map.capacity() == 0) => no memory allocated. This invariant was true prior to the implementation of std::hash_map::HashMap being replaced by hashbrown, and is true for the other std collections, but is not currently true in hashbrown.

My solution could be restricted further to target only this specific case, rather than the more general solution here, but I think it would still require code in erase.

Are you worried about the total code size, or the code size in erase specifically? I don't think this should add more than a single instruction to erase since the condition being checked will already be generated by the previous line (at least on x86).

Have you considered using shrink_to instead of shrink_to_fit to limit the size of a table to a given capacity? This doesn't rely on growth_left for calculations and is therefore independent of the number of tombstones in the table.

I don't think that helps with my specific use-case, since I was using capacity() to decide whether to try to shrink the table - max_capacity() alone does solve the problem for me, but it left this footgun for others due to the broken invariant, which I wanted to also fix.

@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2023

This is intended to restore the invariant that (map.capacity() == 0) => no memory allocated. This invariant was true prior to the implementation of std::hash_map::HashMap being replaced by hashbrown, and is true for the other std collections, but is not currently true in hashbrown.

This has never been something that was guaranteed. The documentation specifically says that the capacity is the number of elements that can be inserted without reallocation, and makes no mention that 0 has a special meaning.

I don't think that helps with my specific use-case, since I was using capacity() to decide whether to try to shrink the table - max_capacity() alone does solve the problem for me, but it left this footgun for others due to the broken invariant, which I wanted to also fix.

I believe just replacing this:

if guard.len() * 3 < guard.capacity() {
    guard.shrink_to_fit();
}

with this:

guard.shrink_to(guard.len() * 3);

should just work. It will also properly free the backing memory of the hashmap when len is 0.

@Diggsey
Copy link
Author

Diggsey commented Jun 10, 2023

This has never been something that was guaranteed. The documentation specifically says that the capacity is the number of elements that can be inserted without reallocation, and makes no mention that 0 has a special meaning.

According to the documentation it would be valid for capacity() to always return 0. That doesn't make it a reasonable implementation choice though.

If hashbrown has memory allocated and isn't storing any elements, then there's definitely room to insert at least one element without reallocating.

Another way to avoid this would be to special-case it in capacity(), which would avoid bloating erase() at least...

I believe just replacing this ... with this ... should just work. It will also properly free the backing memory of the hashmap when len is 0.

This would cause unnecessary extra resizing, and wouldn't reclaim all the memory in the "normal" case though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HashMap reports zero capacity without freeing memory
2 participants