Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Linked list cursors #2570

Merged
merged 3 commits into from
Feb 17, 2019
Merged

RFC: Linked list cursors #2570

merged 3 commits into from
Feb 17, 2019

Conversation

4e554c4c
Copy link
Contributor

@4e554c4c 4e554c4c commented Oct 21, 2018

🖼️ Rendered

Tracking issue

This is my first RFC, so feel free to critique :)

📝 Summary

Many of the benefits of linked lists rely on the fact that most operations (insert, remove, split, splice etc.) can be performed in constant time once one reaches the desired element. To take advantage of this, a Cursor interface can be created to efficiently edit linked lists. Furthermore, unstable extensions like the IterMut changes will be removed.

The reference implementation is here, feel free to make implementation suggestions. :)

❌ TODO

  • Resolve where split, split_before split lists (this seems to be ambiguous in the specification)

@Diggsey
Copy link
Contributor

Diggsey commented Oct 21, 2018

This RFC seems quite vague on the specifics. For example:

  • Do cursors have a lifetime bound to the list - this is necessary for iterators, but isn't mentioned in the RFC, and no lifetimes are present in the struct definitions.
  • Do you expect to be able to have multiple cursors into the same data structure? This is something prevented by the use of mutable cursors and lifetimes, but also one of the ways in which cursors could be most useful.

This seems like the kind of thing that might be better implemented as a crate first - given the amount of new API surface being added, I don't see how it could ever be accepted without at least a sample implementation.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 21, 2018
@4e554c4c
Copy link
Contributor Author

@Diggsey good points
Yeah, these would have to have lifetimes bound to the list. It should be possible to have multiple immutable cursors, but having multiple mutable cursors seems much more difficult (this is why I have it so that a mutable cursor can be used to create an immutable one).

I'll work on a reference implementation so this can be accepted

@4e554c4c
Copy link
Contributor Author

Currently working on the implementation. All is done except for splitting lists. See here: https://github.com/4e554c4c/list_cursors

@mark-i-m
Copy link
Member

Perhaps this could be an eRFC to allow greater experimentation? I would really like to play around with your implementation on nightly before deciding if we want to keep it as is, modify it, or go a different route.

@sfackler
Copy link
Member

@mark-i-m the API would land unstable regardless of the e-ness of the RFC.

@mark-i-m
Copy link
Member

@sfackler Yes, but the e-ness allows us to proceed without as clear of a vision on what API we will eventually adopt. In contrast, IIUC, if we accept a normal RFC for this, we are saying that we are reasonably confident in this approach. I think the approach has merit and is worth trying out, but I can't personally support strongly accepting the RFC because I simply don't have any experience with such an API.

@4e554c4c
Copy link
Contributor Author

4e554c4c commented Oct 22, 2018 via email

@Centril
Copy link
Contributor

Centril commented Oct 22, 2018

@4e554c4c put an "e" before "RFC" ^.^ There's not much difference other than a change in intent.

@4e554c4c 4e554c4c changed the title RFC: Linked list cursors eRFC: Linked list cursors Oct 22, 2018
@4e554c4c
Copy link
Contributor Author

I have edited the title to show that this is an eRFC. The API is not finalized and possibly incomplete and I am taking suggestions.

@Amanieu
Copy link
Member

Amanieu commented Oct 24, 2018

The cursor API that you propose seems to be almost identical to the one that I implemented a while ago for intrusive-collections: Cursor, CursotMut and general docs. I have used this API in some of my personal projects and it has felt very ergonomic. I am not sure why this RFC has chosen to rename some of the methods.

One thing to note in particular as to how cursors work in my crate:

A cursor views an intrusive collection as a circular list, with a special null object between the last and first elements of the collection. A cursor will either point to a valid object in the collection or to this special null object.

Another difference is that I have front/front_mut/back/back_mut methods in LinkedList which return a cursor to the first/last element of the list (or the null element if the list is empty), while cursor and cursor_mut always return a cursor pointing to the null element.

Regarding IterMut vs CursorMut, I will repeat what I said in the other thread: these two types work in fundamentally different ways (a DoubleEndedIterator acts as a deque which can be popped from both ends) and serve completely different purposes (an iterator is for ... iterating, a cursor is for modifying the list).

@4e554c4c
Copy link
Contributor Author

@Amanieu thank you for commenting!
Renaming methods to make things more consistent ("before"->"prev") would probably be a good idea. Also I like your implementation's remove, this may benefit from going that route instead of pop (although doing both works too).
front/front_mut/etc. I decided against because these methods already exist in LinkedList to refer to elements. I haven't decided against adding them, I'm just not sure what the best way to go about it would be.

@xaberus
Copy link

xaberus commented Oct 25, 2018

If I am not mistaken, the proposed API is not safe, i.e. current() can be used to create dangling references.

let mut c = list.cursor_mut();
c.move_next();
// here we create a mutable  reference to an element
let u = c.current();
c.move_prev();
// drop the element
drop(c.pop());
// here we drop this element
let mut dangle = u.unwrap();
// dereferencing a dangling reference to the now no longer existing element
**dangle = &777;
// use after free confirmed  by valgrind...

(I am still new to rust, so take the following with a grain of salt...)

The reason is that in the code above current() creates a reference that has a lifetime of the cursor ('a) but it should have a much narrower scope/lifetime. The point is, returning a reference to an element must either somehow freeze the cursor or taking a raw reference must be disallowed (like RefCell/RefMut).

Now we have basically created two mutable references to the same data: u and c.current() which is undefined behavior.

I think one way to prevent this is to separate the operation of editing the list and editing its elements, i.e. another Cursor* type... but this makes the interface quite useless, as you than cannot inspect the elements that you are editing.

If I remember correctly this is the reason, we have no similar interface in the std...

N.B. This is a nice reference highlighting some interesting problems.

@4e554c4c
Copy link
Contributor Author

Interesting! I thought that the reference would borrow the cursor (like how IndexMut borrows slices).
I believe this is fixable, we would just have to make it so that the output of current borrows the cursor.

@xaberus
Copy link

xaberus commented Oct 25, 2018

@4e554c4c having given it a little more thought I think the prototype for current() should be

pub fn current(&mut self) -> Option<&mut T> {...}
// that is elided to
pub fn current<'b>(&'b mut self) -> Option<&'b mut T> {...}

not

pub fn current(&mut self) -> Option<&'a mut T> {...}

The first gives the reference a lifetime that is smaller than 'a (like a slice gives to a an element reference, 'a: 'b), while the second gives it the lifetime of the list, which is wrong, as the elements should not outlive the list. Hopefully I made no mistake and this makes sense. (Would be nice if somebody with more experience in lifetimes than me could comment on this...)

@4e554c4c
Copy link
Contributor Author

4e554c4c commented Oct 25, 2018

yep! That seems to fix the problem. The lifetime annotation I added seems to have made this worse. I'm going to look at the lifetimes more and update the RFC
Now, the question is: is this behavior valid for cursor?
I think not since this can occur

fn main() {
    let mut list = LinkedList::new();
    let mut c = list.cursor_mut();
    c.insert(3);
    c.insert(2);
    c.insert(1);
    
    let u = {
        let mut c = c.as_cursor();
        c.move_next();
        c.current();
    };
    drop(c.pop());
    // use after free!
    println!("element: {:?}", u);
}

@4e554c4c
Copy link
Contributor Author

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

@xaberus
Copy link

xaberus commented Oct 25, 2018

@4e554c4c: I think the example is actually OK. (I think you meant current() not current();, otherwise u would be ())

with the modified signatures

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current(&mut self) -> Option<&mut T> {...}
    ...
}

the borrow checker (on nightly) complains about c not living long enough in the let u = expression (and it makes sense to me: Cursor "contains" the element T, by calling current() we get a reference &'b T, a': b' [i.e. as a type 'a is 'b and more, 'b cannot outlive 'a]. When we drop the Cursor no references with lifetime 'b can remain).

@xaberus
Copy link

xaberus commented Oct 25, 2018

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

If my reading of the nomicon is correct (get_mut example), the explicit lifetime is not necessary in

pub fn as_cursor<'cm>(&'cm self) -> Cursor<'cm, T>;

By elision rules this should be the same as

pub fn as_cursor(&self) -> Cursor<T>;

(clippy nags about it...)

@Amanieu
Copy link
Member

Amanieu commented Oct 25, 2018

To be precise, the methods on Cursor can return &'list T, but the methods on CursorMut must return &'self mut T. This means that if you want to inspect the previous/next value before any modification then you need to call as_cursor first.

@Amanieu
Copy link
Member

Amanieu commented Oct 25, 2018

Actually, this would probably work:

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&'a T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current_mut(&mut self) -> Option<&mut T> {...}
    pub fn current(&self) -> Option<&T> {...}
    // Same with peek/peek_mut, etc
    ...
}

The advantage of having Cursor return &'a T instead of &'self T is that the reference remain valid after moving the cursor. This is fine since Cursor can't delete or modify any elements in the list.

@4e554c4c
Copy link
Contributor Author

4e554c4c commented Oct 25, 2018

Good point, I'll leave that one how it was.
Should I introduce methods like current/current_mut for CursorMut? I feel like this is somewhat unneeded since one could always call as_cursor to get immutable access to the element and it would have the same borrow semantics.

@4e554c4c
Copy link
Contributor Author

The reference implementation should be pretty much complete thanks to @xaberus
I've also started a list of what currently needs to get resolved in the RFC.

@xaberus
Copy link

xaberus commented Oct 30, 2018

Having tinkered with the reference implementation for a while, here are my impressions and comments so far.

  • I much more prefer the API where the cursors start with the first element, because I constantly forget the initial move_next()/move_prev(). As proposed by @Amanieu, I think head()/head_mut(), tail()/tail_mut() are a better choice.
  • While the wrapping of cursors gives an elegant implementation, the API can unfortunately give unexpected and/or ambiguous results. I started a non-wrapping branch of the reference implementation to test my assumptions. When/if I am happy with the results I will try to write down a pull request for the RFC as time permits.
  • So far, after rewriting parts of reference implementation a couple of times (see the branch above) I ended up with the following simple invariants that I would like to see in the API:
    • inserting an item after the cursor position implies c.peek_after() == Some(e)
    • inserting an item before the cursor position implies c.peek_before() == Some(e)
    • inserting a list before the cursor position implies c.peek_before() == Some(inserted_tail)
    • inserting a list after the cursor position implies c.peek_after() == Some(inserted_head)
    • iterating past the head/tail with move_prev()/move_next() keeps the Cursor one position "before"/"after" the last element, allowing to reverse the iteration direction at any time. Effectively, there are now two empty elements. (This turned out a lot more complicated to implement than I imagined, but is so much harder to abuse.)

I hope to continue this list when I have time to work on it. What are your comments so far?

@4e554c4c
Copy link
Contributor Author

I think this can work. The main reason that Cursor is wrapping is for efficient movement between the front and back of the list, but head()/tail() could do this just as well.

If the cursor doesn't start at the empty element, could we just do away with the empty element altogether? We could have the invariant that the Cursor is always on an element, and remove the Option from the current method. Then head()/tail() could return an Option<Cursor>

The only problem I see with this is that we would need to add some extra method to determine whether the cursor is at the beginning/end of a list, e.g. has_next() or similar

@blankname
Copy link

@xaberus
Copy link

xaberus commented Nov 4, 2018

@Amanieu:

First, // Same as AfterLast if list is empty
Last, // Same as BeforeFirst if list is empty

This is what I wrote in one of my first attempts, but it is kind of surprising in the case of an empty list:

// cursor_mut(Last)
[] <>
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <> 

v.s.

// cursor_mut(Last)
<> []
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <>

If I request a cursor at the Last position and insert and pop an element before it I expect the cursor to be still at the same position. This does not break the invariant, i.e. insert_before(1) -> pop_before() == Some(1), but the cursor value changes from BeforeFirst to AfterLast, i.e. a roundtrip is not a no-op.

@Amanieu
Copy link
Member

Amanieu commented Nov 4, 2018

@xaberus Note that a wrapping cursor (where BeforeFirst is the same as AfterLast) avoids this problem. This is why I went with that approach in intrusive-collections.

@xaberus
Copy link

xaberus commented Nov 4, 2018

@Amanieu I see. In way, this not a real problem: The invariants (sans the last one) enforce a wrapping behavior for inserts which effectively collapses the cursor states for the empty list to None. I found this issue in a test case only because I was inspecting the cursor state directly and not though the API. On the outside there is no visible difference.
I guess I may have found a complicated way to express the same behavior 😄, but for some reason wrapping cursors instill a feeling of unease in me. For example, if I want to find the nth item in the list in a wrapping implementation, I have to inspect each cursor position for None. If I forget about that I will wrap around and end up somewhere in the middle of the list.

// non-wrapping
let c = list.cursor(BeforeFirst);
for _ in 0..n {
    c.move_next();
}
c.current()

v.s.

// wrapping
let c = list.cursor(BeforeFirst);
c.move_next();
for _ in 0..n {
    if c.current().is_none() {
        return None;
    } else {
        c.move_next();
    }
}
c.current()

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Nov 6, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 6, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 7, 2018

The insert_list and insert_list_before methods have O(1) time complexity right ?

@Amanieu
Copy link
Member

Amanieu commented Nov 7, 2018

I would prefer if some changes were made to the method names. In particular, I think that every method name should explicitly specify the direction in which the operation is performed (forwards or backwards).

  • peek/peek_before => peek_next/peek_prev
  • insert/insert_before => insert_after/insert_before
  • pop/pop_before => remove
    • pop is a bad name since it only applies at the head or tail or a list. Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.
  • insert_list/insert_list_before => splice_before/splice_after
    • Splicing is a standard term in linked list operations.
  • split/split_before => split_after/split_before

@nugend
Copy link

nugend commented Nov 7, 2018

I'm just driving by, but @Amanieu has got it right. This is not a clunky interface at all as long as the meaning of those API methods are really clear and unambiguous about which elements are being affected.

This RFC actually strongly reminds me of the more general Zipper concept that Haskell has explored at length (and linked lists are just degenerate trees). I suspect that's not really pertinent to the matter at hand, but I think it's interesting.

@glaebhoerl
Copy link
Contributor

@Amanieu Aiming for symmetry with explicitly specified directions sounds good; however, doesn't this bit present a smidgen of discord in the plan?

Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.

Why to the next element rather than the previous? Why aren't there then two versions of it, with explicitly specified directions, like the others? Of course as a matter of pragmatics, you more often want to move forwards rather than backwards, but if we're choosing not to privilege the forwards direction in the rest of the API...

@xaberus
Copy link

xaberus commented Nov 7, 2018

@glaebhoerl: I second that. Removing an item usually can be decided by a predicate f(&T)->bool. For this

let mut c = list.cursor_mut(BeforeHead);
while c.peek_next().is_some() {
    if let Some(true) = c.peek_next().map(f) {
        c.pop_next();
    } else {
        c.move_next();
    }
}

sounds like a reasonable interface that is symmetric by replacing next by prev. Moreover, this makes the action of removing an item explicit, no need to look up in the reference that the cursor will move in some arbitrary direction.

@eaglgenes101
Copy link

eaglgenes101 commented Nov 9, 2018

I think that it might be possible to allow for multiple mutable cursors, safely, with a method that splits out two bounded cursors. These bounded cursors will then recognize the position at which the cursor was split, and if the linked list element pointed to is that node, following that pointer through the bounded cursor API instead goes to that bounded cursor's sentinel, and looping back around from the far end will go to the node adjacent to the split-at node on the bounded cursor's side instead of to the other end.

Splitting at the sentinel doesn't make sense, so if a split is attempted at the sentinel, then Option::None is returned instead, hence the Option in the return value.

pub fn split_at_mut<'cm>(&'cm mut self) -> Option<(BoundedCursorMut<'cm, T>, &'cm mut T, BoundedCursorMut<'cm, T>)>;

(The node in the middle where the list is split is a no man's land for both bounded cursors, so that if the resulting bounded cursors are used to manipulate the nodes bordering the split, they don't have to change the pointers on the other side's nodes to retain linked list consistency, instead only having to change the middle node's pointer pointing towards their side.)

@rfcbot
Copy link
Collaborator

rfcbot commented Nov 16, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 16, 2018
@Centril Centril added A-types-libstd Proposals & ideas introducing new types to the standard library. A-collections Proposals about collection APIs labels Nov 22, 2018
@clarfonthey
Copy link
Contributor

Anyone going to make a tracking issue?

@Centril
Copy link
Contributor

Centril commented Dec 3, 2018

@clarcharr I usually made them but I missed this one for some reason...

@sfackler given the recent discussion, can someone from T-libs create the tracking issue and merge the RFC to make sure that y'all have taken in and are OK with the recent discussion?

@nitnelave
Copy link

nitnelave commented Dec 9, 2018

Just my grain of salt, but I have used a similar API (https://contain-rs.github.io/linked-list/linked_list/struct.Cursor.html) for a circular linked_list, and the ghost element made it hard to move around since I wasn't interested in where was the beginning/end of the list. The solution I had was to implement seek_forward/seek_backward methods that ensured that the next element was never the ghost, but it was clunky. Would it be somehow possible to have a way to say "I don't want a ghost"? i.e. the ghost would only be here for the empty list, but as soon as you insert an element, it becomes a loop to itself.

We could potentially have it as a generics parameter? (not sure about it)

@mark-i-m
Copy link
Member

Ping?

@Amanieu
Copy link
Member

Amanieu commented Feb 14, 2019

I am currently on holiday, I will merge this RFC when I get back next week.

@Centril
Copy link
Contributor

Centril commented Feb 17, 2019

Huzzah! This RFC is hereby merged!

Tracking issue: rust-lang/rust#58533

@Centril Centril merged commit 2735924 into rust-lang:master Feb 17, 2019
@nickeb96
Copy link

nickeb96 commented Feb 8, 2023

The remove_replace example function is wrong and should be fixed. Its explanation says the following:

For example, consider you had a linked list and wanted to remove all elements which satisfy a certain predicate, and replace them with another element.

Instead, the code in the example will visit the same element infinitely until the predicate function returns false. Also, it never checks the first element.

Here is a rework:

fn remove_replace<T, P, F>(list: &mut LinkedList<T>, p: P, f: F)
    where P: Fn(&T) -> bool, F: Fn(T) -> T
{
    let mut cursor = list.cursor_front_mut();
    loop {
        let should_replace = match cursor.current() {
            Some(element) => p(element),
            None => break,
        };
        if should_replace {
            let old_element = cursor.remove_current().unwrap();
            cursor.insert_before(f(old_element));
        } else {
            cursor.move_next();
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs A-types-libstd Proposals & ideas introducing new types to the standard library. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.