-
Notifications
You must be signed in to change notification settings - Fork 15
Lock-free skip list map/set #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Relevant discussion: crossbeam-rs/crossbeam#109 |
|
I really like this! A concurrent ordered collection would be great to have! I would suggest generalizing the Minor bikeshed: the Finally, it is not clear whether a |
|
Is the reference counting absolutely necessary? I suspect this may have a more noticeable impact on ARM platforms where reference count updates involve memory barrier instructions. It would be ideal if read-only access to the |
|
I'm very excited about this! The skiplist I wrote deadlocks and I haven't been able to investigate why. Assuming its just some bug in my implementation, I think there's an interesting trade off: my skiplist does not support remove, allowing it to have no reliance of ref counting or the crossbeam epoch. Since it currently doesn't work I haven't benchmarked it, but if it turns out to be significantly faster, it would be great to see crossbeam support both the map/set proposed by this RFC and a monotonic version, so that users who don't need the ability to remove or update keys without synchronization can have better performance. |
Sounds good! And yes, we should also add the
Maybe. I think of a Initially I used nullable Perhaps we should have both impl<'a, K, V> Cursor<'a, K, V> {
fn is_null() -> bool;
fn unwrap(self) -> Entry<'a, K, V>;
fn into_entry(self) -> Option<Entry<'a, K, V>>;
fn key(&self) -> Option<&K>;
fn value(&self) -> Option<&V>;
fn next(&mut self);
fn prev(&mut self);
}
impl<'a, K, V> Entry<'a, K, V> {
fn into_cursor(self) -> Cursor<'a, K, V>;
fn key(&self) -> &K;
fn value(&self) -> &V;
fn get_next(&self) -> Cursor<'a, K, V>;
fn get_prev(&self) -> Cursor<'a, K, V>;
}
Inserting anything into the map requires
Absolutely necessary? No. But is it a good default? I think yes. Methods like In order to minimize the performance penalty of pinning and refcounting, we might add methods like the following: impl<'a, K, V> Entry<'a, K, V> {
// A single call to `scan_forward_while` or `scan_backward_while` pins the current
// thread and updates refcounts every 100 steps only (or some other number).
fn scan_forward_while(&self, f: impl FnMut(&K, &V) -> bool) -> Option<Entry<'a, K, V>>;
fn scan_backward_while(&self, f: impl FnMut(&K, &V) -> bool) -> Option<Entry<'a, K, V>>;
}The caveat is that calling the passed function Refcounting is nice because it's easy to use and you really can't get it wrong. There is surely room for an advanced API providing faster iteration, and we can always add it later. I'm not yet sure what it should look like exactly - it's a tough nut to crack. Does that sound like a reasonable plan? Any other major problems with refcounting I'm missing? I should also add - if you need really fast iteration, then perhaps skip list is the wrong data structure to begin with? Switching to a B-tree or a radix tree should offer much larger gains in performance than what can be achieved by avoiding pinning and reference counting in a skip list. In the future, Crossbeam should provide a variety of different kinds of maps and sets to cover situations like these. Finally, note that there is a possibility to make iteration faster in the future by relying on hazard pointers rather than the current combination of pinning and refcounting. The idea is that instead of pinning the current thread and incrementing the refcount before moving to the next node we just update an internal hazard pointer (load + store + fence + load + store, I believe) Right now
Not only that, methods like Any suggestions how we might expose both a skip list that allows and a skip list that forbids removal? How about the following? trait RemovePolicy {}
struct RemoveAllow;
impl RemovePolicy for RemoveAllow {}
struct RemoveForbid;
impl RemovePolicy for RemoveForbid {}
struct SkipMap<K, V, RP: RemovePolicy = RemoveAllow> {
// ...
}
impl<K, V, RP: RemovePolicy> SkipMap<K, V, RP> {
// ...
}
impl<K, V> SkipMap<K, V, RemoveAllow> {
fn remove(&self, k: &K) -> Option<Entry<K, V>> {
// ...
}
}Or should we rely on const generics once they get implemented? enum RemovePolicy {
Allow,
Forbid,
}
struct SkipMap<K, V, const RP: RemovePolicy: RemovePolicy::Allow> {
// ...
}Or just provide a completely different type for the no-remove skip list, perhaps |
I think this is the better choice, similar to how there's currently Also its worth remember that the grow-only types are really only unsynchronized grow-only. They can support remove and get_mut and so on with |
No, this would duplicate the API too much. I'm OK with the current non-nullable entry API, I was just commenting on the naming. The reason I used nullable cursors in
This is not quite the case for
Actually what I had in mind was to simply have
Yes definitely! I think that there are many use cases which are read-heavy and rarely modify data structures. In such cases I would even accept a data structure that requires a lock for writing if it can allow lock-free reads. I currently use a hash table with such a design (the implementation is very specialized though, and probably not suitable for crossbeam). |
I wouldn't call deciding on the API a less important matter, because the functionality that we want to support effectively decides the implementation and limits the performance, but I agree that we should decide on that in structure-specific threads. For example, I am not so enthusiastic about reference-counting by default in the map that I would want, but I see that for other use cases it might be useful. The discussion here seems to be going in the direction of fairly detailed API design, with different people expressing the different requirements that they have. In my opinion it would be better to provide multiple versions of the structure, either using different types or through const generics as you mentioned. These versions could include one which is a drop-in replacement for I propose to make this RFC a kind of meta-design RFC instead. For that purpose, the API snippet as well as the detailed design section should be removed, preserving just sections describing the motivation and overall idea of a concurrent skiplist. Based on such an RFC we could create a repository like Related is the question of whether we want a crate like What do you think? |
I missed that - sounds interesting! Well, I don't see a promising way to implement
Agreed - if there are any concerning problems in the current design that might preclude us from harnessing more performance or extending the API in the future, we should try to resolve them now.
Agreed. So if I understood correctly, you're a fan of I believe there shouldn't be much contention between my refcounting, Amanieu's pinning, and your cloning. We can make everyone happy at the same time. :) Here's how we might do that (not a real proposal, just a quick idea): impl<K: Ord, V> SkipMap<K, V> {
fn insert(&self, k: K, v: V);
fn insert_and_get(&self, k: K, v: V) -> Entry<K, V>;
fn insert_and_pin(&self, k: K, v: V) -> Guard<K, V>;
fn get(&self, k: &K) -> Option<Entry<K, V>>;
fn get_clone(&self, k: &K) -> Option<V> where V: Clone;
fn get_pin(&self, k: &K) -> Option<Guard<K, V>>;
fn remove(&self, k: &K) -> Option<Entry<K, V>>;
fn remove_clone(&self, k: &K) -> Option<V> where V: Clone;
fn get_pin(&self, k: &K) -> Option<Guard<K, V>>;
fn pin(&self) -> PinnedSkipMap<K, V>;
}
impl<'a, K: Ord, V> PinnedSkipMap<'a, K, V> {
fn insert(&self, k: K, v: V) -> &V;
fn get(&self, k: &K) -> Option<&V>;
fn remove(&self, k: &K) -> Option<&V>;
fn repin(&mut self);
}There's a problem with the current design in that how it manages the life of values in the map. The way it works now is - before reading the value in a node, you must increment the refcount (and incrementing is possible only if the count is non-zero). I should abandon this mechanism and make it possible to read values even if the refcount is zero. That means values would be Note that some refcounting is still necessary during insert and remove - we have to count in how many levels is each node installed into the skip list. Without that, a broken Ok, so if we get rid of pervasive refcounting when reading values and expand the API by accommodating pinning and cloning, would that be it? Any other concerns with the current design?
Yes - that's exactly how I think about the RFC, too. Tomorrow I'll add to the text the fruits of this discussion and try to reword it to make the scope of the RFC clearer.
I'm for grouping by data structures rather than interfaces. Agreed with your reasoning. |
FYI, I'm working on a proof-of-concept implementation of EBR + hazards: https://github.com/jeehoonkang/crossbeam-epoch/tree/hazard-pointer I agree with @Vtec234 on characterizing this RFC: let's regard is as a meta one. But I think we can start |
text/2018-01-14-skiplist.md
Outdated
|
|
||
| 1. Adaptive radix tree (keys can only be byte arrays). | ||
| 2. Skip tree (moves elements in memory, thus constraining the API). | ||
| 2. Bw-Tree (moves elements in memory, thus constraining the API). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask why skip tree and Bw-tree have contrained API? Even elements are moved, I think we can support a similar API to the one presented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this question somehow fell off my radar. Consider this:
struct Key(String, AtomicUsize);
// Also impl PartialOrd, PartialEq, and Eq...
impl Ord for Key {
fn ord(&self, other: &Key) -> Ordering { self.0.cmp(&other.0) }
}
let set = BwTreeSet::<Key>::new();
crossbeam::scope(|s| {
s.spawn(|| {
loop {
for e in set.iter() {
e.1.fetch_add(1, Relaxed);
}
}
});
s.spawn(|| {
loop {
let s = random_string();
// Some elements might be moved into new blocks during insertion.
set.insert(Key(s, AtomicUsize::new(0)));
}
});
});What should happen if one thread is incrementing an AtomicUsize, while another thread is moving it to a new location? That's not going to work, so we'll have to impose some kind of restriction on the API.
For example, we could say that all keys must be Copy, thus forbidding types like Key. Or, another solution might be to implement iterators so that they return clones of elements rather than references to elements - that way threads couldn't simultaneously move and modify an element.
This "refcount" is just the dynamic height of the tower, right? Although it has been some time since I wrote it, I vaguely recall that in my own implementation it was enough to have a value for the tower height which is constant across its lifetime, and enforce the order of unlinking tower levels to go from top to the bottom. Then, we know that if the bottom level is unlinked (marked with tag 1), the whole node has been unlinked and so it can be removed, no refcount needed.
Is my understanding correct that this would eliminate the overhead of refcounting almost completely while still enabling the
I have to admit I don't like the idea of having three different ways of accessing elements in one structure - it seems like a mess that would lead to lots of confusion. So for now..
Agreed. Let's first publish this implementation, providing a drop-in API for |
I'm not sure I understand. That sounds correct... provided that the I'll try to elaborate. There are two numbers associated with each node. The first is height, which is always constant. The second is refcount, which is the number of levels the node is linked in + the number of In order to remove a node, we have to go through two stages: marking pointers in its tower and then unlinking the node in each level. Both stages go from the top level to the bottom level (level 0). Only after the node has been unlinked from each level we can add it to the GC. When unlinking a node at a specific level, we perform a CAS in the tower of the predecessor to change its pointer to point to the successor. If the CAS fails, we have to search again for the node that is being removed so that the search function unlinks the node along the way. While moving through a level, the search function unlinks every encountered node with a marked pointer at the current level. Now, the problem arises from the fact that the The only way to really know whether a node is linked into the skip list at a some level is by having the search function do something after unlinking a node. When the search function unlinks a node, it should either mark the pointer at the current level in the unlinked node, or decrement the node's refcount, or do something else. But in any case, it has to do something - some atomic value in the unlinked node has to be updated. In my implementation, that is what refcount is for. When the refcount reaches zero, that means the node has been unlinked from all levels and there are no
Yes, it would. :) You can think of refcounting as some minimal, but necessary bookkeeping required to ensure memory safety (because we cannot trust
Sure, why not! I've just tweaked the destruction mechanism so that you don't have to increment the refcount before reading values anymore. In other words, now we can easily add |
|
I have a few suggestions and questions on minor details in the implementation:
|
It's just a matter of tradeoffs - I don't have a particularly convincing argument for the choice of ordering, but it seemed to produce the best results overall during testing. The main argument is: if we want to optimize insert/search/delete operations, then
Sure - I'll do that!
Not really... I just felt too nervous doing unchecked arithmetic.
Eh, just didn't bother with that. :) All this funny allocation business is just a workaround until we get
Hmmm, almost? :) I don't understand why the
Done.
Sure. I've changed the comment to: // TODO(stjepang): Embed a custom `epoch::Collector` inside `SkipList<K, V>`. Instead of adding
// garbage to the default global collector, we should add it to a local collector tied to the
// particular skip list instance.
//
// Since global collector might destroy garbage arbitrarily late in the future, some skip list
// methods have `K: 'static` and `V: 'static` bounds. But a local collector embedded in the skip
// list would destroy all remaining garbage when the skip list is dropped, so in that case we'd be
// able to remove those bounds on types `K` and `V`.
//
// As a further future optimization, if `!mem::needs_drop::<K>() && !mem::needs_drop::<V>()`
// (neither key nor the value have destructors), there's no point in creating a new local
// collector, so we should simply use the global one.
Yes.
That would set the height to 0. The proper store operation would be: |
|
Sorry for the delay, everyone. I've just added a section on alternative interfaces and an explanation that the current map/set API is only tentative. Is there anything still missing from the RFC? |
I see now what you mean, thanks. In my implementation this was fine, since I just hashed everything, and |
Vtec234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting, thanks for all the great work!
text/2018-01-14-skiplist.md
Outdated
| // keeping the thread pinned. | ||
| // | ||
| // The main drawback here is that the user must be careful and | ||
| // not keep the guarda live for too long, or else garbage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guarda -> guard
| // not keep the guarda live for too long, or else garbage | ||
| // collection will get stuck. | ||
| fn get_guard(&self, k: &K) -> Option<Guard<K, V>>; | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also list with_element<F: FnOnce<&V>>(k: &K, f: F) under alternative 2? It's effectively the same as get_guard, but makes the guarded region more visible.
|
@Amanieu and @jeehoonkang - do you still have any reservations with this RFC or can we merge? :) |
|
Sorry, I've been busy these days. I'll look into the code once more tomorrow. Sorry! |
|
I've been considering the API from the The only suitable API which fits in those constraints is one where the user managers collectors and handles themselves and passes a I understand that this type of API may not be very user-friendly, so I won't complain if you decide to stick with a simpler API. However I will probably end up using a fork of your skiplist which uses a |
|
@Amanieu I suggest we make Let's hash out the details in the new repo's issue tracker. |
This is a proposal to introduce a new crate named
crossbeam-skiplist, which would provide a concurrent map and a concurrent set similar toBTreeMapandBTreeSet.Rendered
Implementation
cc @withoutboats