-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High spinlock overhead #146
Comments
Use the code from [1] to implement the spinlock based on std::atomic<bool> instead of std::atomic_flag. While the former is not necessarily lock-free, it is on the majority of platforms. A static assert is added to catch this on platforms that don't have this - we could potentially use the older implementation on those instead then. WIP because I don't have a good scientific benchmark for this yet. I tested it in our realworld application, and it seems to have slightly reduced the load of the spinlock, but not in a really large way... See also: efficient#146 [1]: https://rigtorp.se/spinlock/
Conditionally using |
Sadly, we cannot yet use C++20 and are stuck with C++17 until macOS updates its toolchain :( So at least for me own purposes, I'm looking for a solution that is also compatible with at least C++17. |
Hi Milian, thanks for bringing this up! I can certainly see how spinlocks in general may be suboptimal for use-cases where the critical section is expensive, since spinning wastes CPU cycles. So from that standpoint, there is value in allowing users to specify their own lock implementation. (separately I'm happy to accept general-purpose improvements to the spinlock implementation :) ). On the other hand, I worry that fixing a specific locking interface may restrict the set of future optimization opportunities for the table. For instance, the original paper that libcuckoo is based on employed atomic version counters instead of actual locks to reduce contention (https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf, search for "optimistic locking"). We've also experimented with different locking implementations such as read-write locks. It would be great to maintain this flexibility in how we manage concurrency in the hashtable. For the use-case you detailed in #138, would it be at all possible to try moving the expensive work outside the hashtable insert operation? I'm thinking something along these lines:
|
I'm all for keeping things flexible for the implementation side of things. Instead of letting the user pass through a custom mutex/lock implementation, maybe for starters one could offer various locking schemes that are selectable via some enumerator? the current polling would be the default, and one could add e.g. one that waits instead next. but then again, maybe a generic improvement to the spinlock to employ e.g. exponential backoff would be a win-win and shouldn't be user-configurable. I clearly don't know enough about this topic yet :) regarding your suggestion for my example use case: thanks! that is a very interesting idea, I will for sure try that out. I just fear that this will simply move the contention from one place to another, as then we would burn CPU time while polling in the new while loop. Anyhow, I should really try it out and report some hard numbers back for you. Cheers |
During the discussion of [1] it has been suggested that the congestion could be reduced by relying on C++20 futex APIs. Additionally we incorporated the `try_lock()` design from [2]. [1]: efficient#146 [2]: https://rigtorp.se/spinlock/ Co-authored-by: Milian Wolff <[email protected]>
During the discussion of [1] it has been suggested that the congestion could be reduced by relying on C++20 futex APIs. Additionally we incorporated the `try_lock()` design from [2]. [1]: efficient#146 [2]: https://rigtorp.se/spinlock/ Co-authored-by: Milian Wolff <[email protected]>
Hey there,
in one of our scenarios, we encounter quite a lot of contention that leads to a significant CPU overhead in the libcuckoo spinlock implementation.
On one hand, the spinlock could/should be improved based on the insights from https://rigtorp.se/spinlock/
On the other hand, I did that locally (I will provide pull request for that) and did not measure a huge improvement. Spinlocks are naturally CPU intensive for contention. In my case, as outlined in #138, the insertion can take significant resources and time. Thus contention is to be expected and I would like to reduce the CPU overhead - but I don't see a way to do that with normal spinlocks.
Would it be acceptable to expand libcuckoo to allow more fine grained control over the lock? Potentially even allowing me to provide my own lock class. See also https://www.youtube.com/watch?v=z7fPxjZKsBY&t=1550s for some inspiration of what could potentially be done.
I'm open for discussion and insight here. Additionally, I'll try and find some time to create a benchmark for this scenario, so we can make better educated decisions. Or do you happen to have a benchmark for the contended case already?
Thanks
The text was updated successfully, but these errors were encountered: