Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when using [] operator on map when accessing a key that is known to already exist #98

Closed
proc-sim opened this issue Dec 11, 2023 · 8 comments · Fixed by #100
Assignees
Labels
bug Something isn't working

Comments

@proc-sim
Copy link

proc-sim commented Dec 11, 2023

Summary:

The [] operator's invocation of do_try_emplace makes a potentially unnecessary call to increase_size if a map is full, even if the requested key already exists in the map, leading to race conditions when maps are accessed by multiple threads.

Info:

I'm doing a drop-in replacement of std::unordered_map with ankerl::unordered_dense and have been running into lots of crashes in my codebase, which can be narrowed down to the following code pattern, called from multiple threads on the same map:

if (map.find(key) != map.end())
{
    auto &val = map[key];
    //...do something
}

This is safe code when using std::unordered_map. However, after examining ankerl::unordered_dense::map, the [] operator invokes do_try_emplace, which will attempt to increase the size of a map if the map is full (with a call to increase_size, given the is_full condition), prior to any checks to see if the key in question already exists. This leads to a race condition if multiple threads are using the [] operator on a map that is full - even if all keys requested have already been inserted into the map (meaning there's no reason the map should require reallocations, in principle).

I recognize using the iterator returned by find to access the value would be a solution, but it still seems like the problem is ultimately a bug. The existence of this problem means the [] operator can never be used in a thread-safe manner unless the map is guaranteed not to be full - even if no new keys are being inserted.

@proc-sim proc-sim added the bug Something isn't working label Dec 11, 2023
@martinus
Copy link
Owner

Thanks for reporting, I'll see what I can do.

@proc-sim
Copy link
Author

proc-sim commented Dec 12, 2023

I did some testing and came up with a solution that is working. Here is the modified function (I'm posting it here rather than a pull request, because the recursion isn't necessarily ideal. However, during a million insertions, the recursive call only happens about a dozen times due to the rate at which the map fills up after resizing, so it's a virtually negligible difference...).

template <typename K, typename... Args>
auto do_try_emplace(K&& key, Args&&... args) -> std::pair<iterator, bool> {
    
    if (ANKERL_UNORDERED_DENSE_UNLIKELY(m_max_bucket_capacity == 0)) {
        increase_size();
    }

    auto hash = mixed_hash(key);
    auto dist_and_fingerprint = dist_and_fingerprint_from_hash(hash);
    auto bucket_idx = bucket_idx_from_hash(hash);

    while (true) {
        auto* bucket = &at(m_buckets, bucket_idx);
       
        if (dist_and_fingerprint == bucket->m_dist_and_fingerprint) { //tyson edit
            if (m_equal(key, get_key(m_values[bucket->m_value_idx]))) {
                return {begin() + static_cast<difference_type>(bucket->m_value_idx), false};
            }
        } else if (dist_and_fingerprint > bucket->m_dist_and_fingerprint) 
        {
            if (ANKERL_UNORDERED_DENSE_UNLIKELY(size() >= m_max_bucket_capacity)) {
                increase_size();
                return do_try_emplace(key, args...);                    
            }
            return do_place_element(dist_and_fingerprint, bucket_idx, std::forward<K>(key), std::forward<Args>(args)...);
        }
        dist_and_fingerprint = dist_inc(dist_and_fingerprint);
        bucket_idx = next(bucket_idx);
    }
}

Notes:

The first call to increase size at the top of the function is now conditional on the max capacity being zero, rather than is_full being true. This ensures our map is properly initialized if empty, but will never be called if the [] operator is used and the key already exists (it's impossible for a key to exist and the map to be empty). So that prevents the initial race condition.

And finally, when (and only when) we determine that the key doesn't exist, that's when we check if the map is full and increase its size if necessary - finally returning with a recursive call to the function in order to get proper bucket_idx/dist_and_fingerprint values after the allocation in order to insert the key.

I tested pretty extensively in my own codebase and these two changes seem to have done the trick.

@martinus
Copy link
Owner

Thanks, I came up with very similar code - if possible I'd like to get rid of the initial if, I still need to think about that when I have time

@ktprime
Copy link

ktprime commented Dec 15, 2023

@proc-sim

if (map.find(key) != map.end())
{
    auto &val = map.at(key);
    //...do something. is better ?
}

@proc-sim
Copy link
Author

proc-sim commented Dec 15, 2023

@ktprime No, at() calls find again internally, so that's worse than just using the iterator from find() (which I mentioned in my original post):

auto it = map.find(key);
if (it != map.end())
{
    auto &val = it->second;
}

@martinus
Copy link
Owner

@proc-sim
Copy link
Author

proc-sim commented Dec 17, 2023

@martinus Tested your update in my codebase - works great, no issues detected. Thanks for jumping on this so quick!

@martinus
Copy link
Owner

I've already created a new release 4.2.0 with this new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants