Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mut ref dependents #60

Merged
merged 4 commits into from
Sep 15, 2024

Conversation

Voultapher
Copy link
Owner

@Voultapher Voultapher commented Sep 6, 2024

This has feature that has been requested multiple times by users. This also closes the last AFAIK feature gap to ouroboros. ouroboros supports async builders.

Fixes #59 and #36.

This has feature that has been requested multiple times by users,
#36 and
#59.
@Voultapher
Copy link
Owner Author

@steffahn do you mind giving this a review?

This is less code, less run-time work and
conceptually harder to miss-use. Win-win.
@steffahn
Copy link
Contributor

steffahn commented Sep 9, 2024

I wonder what the mut_borrow_lock! macro is needed for; sure, if the MutBorrow::borrow_mut call didn't happen during construction, then it could be called later, which this macro sort-of prevents, but for what exact purpose? Am I missing something? Or is this just an artifact of the incremental steps leading to the current design, and no longer needed?

@Voultapher
Copy link
Owner Author

What I had in mind was that if the owner is not borrowed during construction, it could be borrowed later and then shoved into for example mem::take after which accessing the dependent would be UB. But since MutBorrow is explicitly designed to not provide access to the inner value other than through borrow_mut the &MutBorrow<T> in the dependent would be useless and couldn't be abused. So indeed we can just drop the call to lock all together and avoid the special casing and trait stuff.

@steffahn
Copy link
Contributor

steffahn commented Sep 10, 2024

One point of comparison in terms of API is RefCell by the way. Specifically, RefCell combined with RefMut::leak. This helps argue for soundness easily, too.

(code example meant for soundness demonstration; manual implementation with Cell<bool> instead of Cell<isize> in RefCell still has a small size advantage … well, and it doesn’t need nightly)

// Unstable, but 'MutBorrow::leak(…)' it could also be 'simulated' without this,
// via '&mut **Box::leak(Box::new(…))', at the cost of also leaking some memory
#![feature(cell_leak)]

use std::cell::{RefCell, RefMut};

pub struct MutBorrow<T>(RefCell<T>);

impl<T> MutBorrow<T> {
    /// Constructs a new `MutBorrow`.
    pub fn new(value: T) -> Self {
        Self(RefCell::new(value))
    }

    /// Obtains a mutable reference to the underlying data.
    ///
    /// This function can only sensibly be used in the builder function. Afterwards, it's impossible
    /// to access the inner value, with the exception of [`MutBorrow::into_inner`].
    ///
    /// # Panics
    ///
    /// Will panic if called anywhere but in the dependent constructor. Will also panic if called
    /// more than once.
    pub fn borrow_mut(&self) -> &mut T {
        let Ok(guard) = self.0.try_borrow_mut() else {
            panic!("Tried to access locked MutBorrow")
        };
        RefMut::leak(guard)
    }

    /// Consumes `self` and returns the wrapped value.
    pub fn into_inner(self) -> T {
        self.0.into_inner()
    }
}

(playground)

A downside of this overall approach, e.g. compared to ouroboros, would be that the usage of Cell<bool> implies !Send + !Sync.

@steffahn
Copy link
Contributor

the usage of Cell<bool> implies !Send + !Sync

Actually, only !Sync, but that can be bad enough. One idea around that however would be to use AtomicBool instead; on first thought, I believe one can even use Relaxed ordering throughout

…here’s how usage of AtomicBool here can look like in the code (click to expand).
use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicBool, Ordering::Relaxed};

pub struct MutBorrow<T> {
    // Private on purpose.
    is_locked: AtomicBool,
    value: UnsafeCell<T>,
}

// Like the impl for `Mutex`
unsafe impl<T: ?Sized + Send> Sync for MutBorrow<T> {}

impl<T> MutBorrow<T> {
    /// Constructs a new `MutBorrow`.
    pub fn new(value: T) -> Self {
        // Use the Rust type system to model an affine type that can only go from unlocked -> locked
        // but never the other way around.
        Self {
            is_locked: AtomicBool::new(false),
            value: UnsafeCell::new(value),
        }
    }

    /// Obtains a mutable reference to the underlying data.
    ///
    /// This function can only sensibly be used in the builder function. Afterwards, it's impossible
    /// to access the inner value, with the exception of [`MutBorrow::into_inner`].
    ///
    /// # Panics
    ///
    /// Will panic if called anywhere but in the dependent constructor. Will also panic if called
    /// more than once.
    pub fn borrow_mut(&self) -> &mut T {
        // Ensure this function can only be called once.
        // Relaxed should be fine, because only one thread could ever read `false` anyway,
        // so further synchronization is pointless.
        let was_locked = self.is_locked.swap(true, Relaxed);
        if was_locked {
            panic!("Tried to access locked MutBorrow")
        } else {
            // SAFETY: `self.is_locked` starts out as locked and can never be unlocked again, which
            // guarantees that this function can only be called once. And the `self.value` being
            // private ensures that there are no other references to it.
            unsafe { &mut *self.value.get() }
        }
    }

    /// Consumes `self` and returns the wrapped value.
    pub fn into_inner(self) -> T {
        self.value.into_inner()
    }
}

@Voultapher
Copy link
Owner Author

The comparison with RefCell is helpful thanks.

Actually, only !Sync, but that can be bad enough. One idea around that however would be to use AtomicBool instead;

Good point, I think doing so would be a good fit for self_cell which prefers giving users one good easy to understand option instead of many knobs to tweak. The most expensive part will likely remain the heap allocation during construction and subsequent deref. So I think going from Cell<bool> to AtomicBool<bool> is a good trade-off for ease-of-use. While it would be possible to split it into two types I don't see the need here.

I believe one can even use Relaxed ordering throughout

I don't think that works. One could share a &MutBorrow<T> across threads and then race borrow_mut() which could lead to multiple unique references which is UB. I think we need acquire + release semantics.

@steffahn
Copy link
Contributor

steffahn commented Sep 10, 2024

No, I do not believe Relaxed ordering ever supports multiple threads to receive false from the swap(true) operation…

wait, why did you not use swap like in the code I’ve shared above!? load followed by a conditional store is actually completely unsound, no matter the Ordering choices.

Let me repeat the relevant section from my precious reply.

    pub fn borrow_mut(&self) -> &mut T {
        // Ensure this function can only be called once.
        // Relaxed should be fine, because only one thread could ever read `false` anyway,
        // so further synchronization is pointless.
        let was_locked = self.is_locked.swap(true, Relaxed);
        if was_locked {
            panic!("Tried to access locked MutBorrow")
        } else {
            // SAFETY: `self.is_locked` starts out as locked and can never be unlocked again, which
            // guarantees that this function can only be called once. And the `self.value` being
            // private ensures that there are no other references to it.
            unsafe { &mut *self.value.get() }
        }
    }

@Voultapher
Copy link
Owner Author

My bad, I had missed that part. You are right, disjointed load and store are non-sense here. Looking at the generated machine-code Relaxed vs AcqRel is the same on x86 and slightly different on aarch64:

x86:

mov     al, 1
xchg    byte ptr [rdi], al
test    al, al
setne   al
ret

aarch64:

str     x30, [sp, #-16]!
mov     x1, x0
mov     w0, #1
bl      __aarch64_swp1_relax / __aarch64_swp1_acq_rel
cmp     w0, #0
cset    w0, ne
ldr     x30, [sp], #16
ret

My mental model for Relaxed semantics was that the compiler is allowed to re-order instructions as it sees fit, and that the CPU core can write the value to it's private caches and may at some point synchronize it to other cores. A relaxed load may see a stale value, even one written with Release semantics. I conceptualize swap as similar to a fetch_add, with a store and a load part of the operation. If the store happens with Relaxed semantics and another thread does a load with Relaxed semantics, wouldn't that allow a stale value to be read?

@Voultapher
Copy link
Owner Author

With a more specific aarch64-apple-darwin target we get a more concrete difference between Relaxed and AcqRel:

mov     w8, #1
swpb / swpalb   w8, w8, [x0]
cmp     w8, #0
cset    w0, ne
ret

Based on this documentation https://www.scs.stanford.edu/~zyedidia/arm64/swpb.html

SWPB has neither acquire nor release semantics.

@steffahn
Copy link
Contributor

steffahn commented Sep 11, 2024

My mental model for memory orderings of atomic operations is that they are only about how the operations are ordered relative to other memory. E.g. if you have – mutex-style – one thread “unlocking” something (in an UnsafeCell<T>, say) and another thread “locking” something, e.g. via an AtomicBool, then Release+Acquire can make sure that that all access to the UnsafeCell<T> one thread does before unlocking are also properly ordered before all access the other thread could have to the same UnsafeCell<T> after locking.

However, in this case we need no synchronization, because only one thread is ever going to touch the contained value anyways. Well, and at one point the value was constructed, and at the end it is destructed, but those accesses should already be accounted for by the mechanisms that would allow a &MutBorrow<T> reference to have gotten to a different thread in the first place.

As a different point of comparison; I believe that if you do a lot of fetch_add(1) operations (and nothing else) on atomic integers, in order to generate some form of unique IDs, with relaxed ordering, then you are still guaranteed that no two invocations receive the same integer. (As long as the overall number of invocations don’t allow for any overflow, that is.) We’re doing essentially the same here, but with a saturating_fetch_add(1) and on a 1-bit integer, so receiving a 0 can still only happen for one thread, but there are lots of 1s being passed out afterwards; only the uniqueness of the 0 (i.e. the false) is all we need for soundness, no further synchronization is required.

The assembly is interesting; I had only checked x86 so far, good to know the difference can actually be meaningful on other platforms. Another operation here, instead of swap(true) would be fetch_or(true), by the way. I wonder if those compile to the same [comparing asm for swap(true) vs fetch_or(true) ] on other platforms, too.


All that being said, realistically, this all seems completely irrelevant from a performance stand-point, so feel free to use any memory ordering you like.

@Voultapher
Copy link
Owner Author

Voultapher commented Sep 11, 2024

Regarding fetch_or(true, Ordering) that compiles to IMO worse code https://rust.godbolt.org/z/acxTj7EGE. On x86 it generates a loop if I understand the code correctly.

@steffahn
Copy link
Contributor

Oh, wow, a loop seems bad, why hadn’t I come across that yet?

Ah, now I see what I did differently(!!): I had only compared swap(false) with fetch_and(false); not swap(true) with fetch_and(true). Apparently, compilation translates all this into atomic operations on u8, and on u8 there really is a difference between bitwise-or with 1, and set/swap to 1, because the former only sets the last bit; but no difference between bitwise-and with 0 and set/swap to 0.

Given that it looks like the compiler chooses to use the swap instruction/code to “improve” the fetch_and(false), and as you noted the other operation can result in a loop anyways, swap definitely seems preferable over fetch_or(true).

@Voultapher Voultapher force-pushed the add-support-for-mut-ref-dependents branch from 9552c6d to 9517eeb Compare September 11, 2024 17:03
@Voultapher
Copy link
Owner Author

Doing some further research I'm convinced you are right, only ever one thread could read false even with relaxed ordering semantics.

I think it should be good to be merged now.

@Voultapher Voultapher merged commit 10f545e into main Sep 15, 2024
5 checks passed
@Voultapher Voultapher deleted the add-support-for-mut-ref-dependents branch December 5, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutable reference on owner in builder
2 participants