Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving elements out of a hash set #96

Closed
sh-mochi opened this issue Dec 3, 2023 · 8 comments
Closed

Moving elements out of a hash set #96

sh-mochi opened this issue Dec 3, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@sh-mochi
Copy link

sh-mochi commented Dec 3, 2023

An 'erase(key)' method that returns the previously stored instance by means of move construction would be really handy.
Moving elements out of a container is tricky when all you get is a const ref.
Please consider the following two methods for your table<> template.
I am not 100% sure but I guess something like that should work just fine?

// Erase element referenced by 'i' and return a new object move constructed from the erased element.
auto eject(const const_iterator i) -> value_type {
    value_type tmp{std::move(const_cast<value_type&>(*i))};
    erase(i);
    return tmp;
}

// Lookup 'key' and if present return 'eject(find(key))', otherwise return an empty optional.
template <class K>
auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place, std::move(const_cast<value_type&>(*i)));
    erase(i);
    return result;
}
@sh-mochi sh-mochi added the enhancement New feature or request label Dec 3, 2023
@martinus
Copy link
Owner

can't you use something like this, maybe in a helper function? I'm a bit wary with adding new functionality that's possible with other means

if (auto it = map.find(key); it != map.end()) {
    val = std::move(it->second);
    map.erase(it);
}

@sh-mochi
Copy link
Author

sh-mochi commented Dec 18, 2023

Not possible by other means sadly. Sometimes you need to get not just the associated value (mapped type) but also the key. This is problematic for both a map and a set structure because moving the key changes the value that still resides in the container and leaves it (the value) in an unspecified (but usable) state. The std::unordered_map/set containers are then left in an invalid state. Thus the key type is neatly guarded by constness.

if (auto it = set.find(key); it != set.end()) {
    val = std::move(const_cast<SomeType &>(*it)); // HACK
    set.erase(it);
}

So you can copy the key, but that might involve dynamic allocation (e.g. strings/buffers), or worse: the key is not copy-able and so forever 'stuck' in that container. Your implementation would allow keys to be 'moved out' of map/set containers - there is just no API for that. In my view that is a useful feature worth considering.

@sh-mochi
Copy link
Author

Here is what I had in mind.

template <class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
    value_type tmp{std::move(const_cast<value_type&>(*i))};
    erase(i);
    return tmp;
}

template <class K, class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place, std::move(const_cast<value_type&>(*i)));
    erase(i);
    return result;
}

template <class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
    value_type tmp{
        std::move(const_cast<Key&>(i->first)),
        std::move(const_cast<T&>(i->second))};
    erase(i);
    return tmp;
}

template <class K, class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place,
        std::move(const_cast<Key&>(i->first)),
        std::move(i->second));
    erase(i);
    return result;
}

The problem case:

ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
// silently invokes copy constructor
// also allocs memory during clean up
auto cp_obj = std::move(*i);
assert(!i->empty());
// moves, but yikes
auto mv_obj = std::move(const_cast<std::string&>(*i));
assert(i->empty());
m.erase(i);

With new API:

ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
auto mv_obj = m.eject(i); // string is moved, no alloc -> reliable cleanup
assert(mv_obj == "this string is way too long for SSO!");

@martinus
Copy link
Owner

martinus commented Dec 21, 2023

Hi again, I see your point. Currently I have an extract() method that extracts the whole underlying std::vector. I'd say the API should be the same as for erase(), except that it doesn't return an iterator:

auto extract(const_iterator i) -> value_type;

auto extract(Key const& key) -> std::optional<value_type>;

template <class K>
auto extract(Key&& key) -> std::optional<value_type>;

It might also be a good idea to add an API that can throw an exception if one doesn't like the optional (similar to at):

// throws std::out_of_range when key is not found
auto extract_at(Key const& key) -> value_type;

// throws std::out_of_range when key is not found
template <class K>
auto extract_at(Key&& key,) -> value_type;

Implementation is a bit trickier though, as it's not allowed to call erase(it) when the object has already been moved out.

@martinus martinus reopened this Dec 21, 2023
@sh-mochi
Copy link
Author

"..not allowed to call erase(it) when the object has already been moved out." Ugh, even though I looked at the code I totally didn't catch that the hash value has to be re-computed. I thought about using your hash map/set as a replacement for something I had cooked up in the past and finally found some time to do so. The memory usage is about 20% down (massive) and speed almost doubled. Had to replace usage of std::pair with std::tuple and bolt those extract() methods on because that is what my library needs. All I did was copy do_erase(bucket_idx) as do_extract(bucket_idx) where I grab the value right before m_values.pop_back(). On top of that I added the extract methods as you suggested based on erase(it). Here are the relevant (and hopefully correct) changes I made.

auto do_extract(value_idx_type bucket_idx) {
    auto const value_idx_to_remove = at(m_buckets, bucket_idx).m_value_idx;

    // shift down until either empty or an element with correct spot is found
    auto next_bucket_idx = next(bucket_idx);
    while (at(m_buckets, next_bucket_idx).m_dist_and_fingerprint >= Bucket::dist_inc * 2) {
        at(m_buckets, bucket_idx) = {dist_dec(at(m_buckets, next_bucket_idx).m_dist_and_fingerprint),
                                     at(m_buckets, next_bucket_idx).m_value_idx};
        bucket_idx = std::exchange(next_bucket_idx, next(next_bucket_idx));
    }
    at(m_buckets, bucket_idx) = {};

    // update m_values
    if (value_idx_to_remove != m_values.size() - 1) {
        // no luck, we'll have to replace the value with the last one and update the index accordingly
        auto& val = m_values[value_idx_to_remove];
        val = std::move(m_values.back());

        // update the values_idx of the moved entry. No need to play the info game, just look until we find the values_idx
        auto mh = mixed_hash(get_key(val));
        bucket_idx = bucket_idx_from_hash(mh);

        auto const values_idx_back = static_cast<value_idx_type>(m_values.size() - 1);
        while (values_idx_back != at(m_buckets, bucket_idx).m_value_idx) {
            bucket_idx = next(bucket_idx);
        }
        at(m_buckets, bucket_idx).m_value_idx = value_idx_to_remove;
    }
    value_type tmp = std::move(m_values.back());
    m_values.pop_back();
    return tmp;
}

auto extract(iterator it) -> value_type {
    auto hash = mixed_hash(get_key(*it));
    auto bucket_idx = bucket_idx_from_hash(hash);

    auto const value_idx_to_remove = static_cast<value_idx_type>(it - cbegin());
    while (at(m_buckets, bucket_idx).m_value_idx != value_idx_to_remove) {
        bucket_idx = next(bucket_idx);
    }

    return do_extract(bucket_idx);
}

template <typename Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
auto extract(const_iterator it) -> value_type {
    return extract(begin() + (it - cbegin())); // maybe a `it.mut() -> iterator` function would be useful
}

template <class K>
auto extract(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    return extract(i);
}

@martinus
Copy link
Owner

Thanks for the code, but you are moving out the wrong element :)

I have a version up in PR #105 that should work correctly, can you give it a try? https://raw.githubusercontent.com/martinus/unordered_dense/633023a0abf9ffa44092e4c2510fa8a7f6a3cb21/include/ankerl/unordered_dense.h

@sh-mochi
Copy link
Author

During xmas I always test hash maps with just single entry and it worked like a charm ;)

Today I break that rule. Merged the changes and actually did some testing regarding map<string,int>::extract() + 10 minutes fuzzing with random string keys. All good, thank you very much!

Happy Holidays!

@martinus
Copy link
Owner

martinus commented Dec 24, 2023

Thanks for testing, happy Holidays! Fixed in #105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants