Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PyClassBorrowChecker thread safe #4544

Merged
merged 22 commits into from
Oct 5, 2024

Conversation

ngoldbaum
Copy link
Contributor

Ref #4265 (comment) and replies from @alex.

I tried doing this with an AtomicUsize but because try_borrow accepts an immutable reference, I couldn't figure out a way to get that to work without keeping the Cell. A mutex seemed like a more natural choice for the existing code structure.

I think in principle we could use a mutex on the GIL-enabled build as well, since the lock is only held very briefly in rust to update the borrow checker state.

Is the new test that only triggers on the free-threaded build OK? I could also write it to test that there isn't an exception on the GIL-enabled build.

@alex
Copy link
Contributor

alex commented Sep 10, 2024

Hmm, not sure I follow why taking an & reference should be a problem -- all of the relevant AtomicUsize methods are also &self?

@ngoldbaum
Copy link
Contributor Author

Hmm, not sure I follow why taking an & reference should be a problem

I realized today that if I make BorrowFlag mutable, then I also need to make increment and decrement take &self instead of self. I was confused by the error the borrow checker was giving me and thought it indicated a fundamental problem. Thanks for pointing out my reasoning was flawed.

src/pycell/impl_.rs Outdated Show resolved Hide resolved
Copy link

codspeed-hq bot commented Sep 11, 2024

CodSpeed Performance Report

Merging #4544 will not alter performance

Comparing ngoldbaum:pyclass-borrow-checker (5550527) with main (cb88478)

Summary

✅ 83 untouched benchmarks

@ngoldbaum
Copy link
Contributor Author

Ping @colesbury - I'd appreciate a code review from you if you have some spare cycles.

src/pycell/impl_.rs Outdated Show resolved Hide resolved
@mejrs
Copy link
Member

mejrs commented Sep 17, 2024

I feel quite strongly that we shouldn't merge this PR. The semantics in #4265 (comment) make much more sense to me.

@davidhewitt
Copy link
Member

davidhewitt commented Sep 18, 2024

@mejrs I'm very much in agreement with you that keeping these "refcell-like" semantics on the freethreaded build is likely to be completely unusable in real-world conditions.

Would you be prepared to accept this PR merging in 0.23 with the understanding that we will follow up in 0.24 with something akin to #4265 (comment), i.e. a more realistic design.

I think this allows us to make PyO3 sound for downstream testing of free-threading without needing to introduce breaking changes (of which there are already enough collected for this release).

@alex
Copy link
Contributor

alex commented Sep 18, 2024

I disagree strongly that these are unusable in real-world situations. I did a review of pyca/cryptography and these semantics would be appropriate for all of our usage of non-frozen pyclasses.

The reason is that while these types require mutable access, there's no particularly coherent behavior for concurrent multi-threaded usage. These types are, for example, iterators and hashers. There's no real world usage for concurrent mutable use.

@mejrs
Copy link
Member

mejrs commented Sep 18, 2024

Would you be prepared to accept this PR merging in 0.23 with the understanding that we will follow up in 0.24 with something akin to #4265 (comment), i.e. a more realistic design.

I'm OK with that.

I disagree strongly that these are unusable in real-world situations.

They may work in your situation, but not in general. Imagine a List-like pyclass with a fn push(&mut self, value) method. Most of the time that is going to work in a multi threaded context, but sometimes it will trip a borrow error and crash the program. It sounds like a footgun that everyone is going to run into at some point.

@ngoldbaum ngoldbaum mentioned this pull request Sep 20, 2024
3 tasks
@davidhewitt
Copy link
Member

I did a review of pyca/cryptography and these semantics would be appropriate for all of our usage of non-frozen pyclasses.

Thanks, that's interesting to see. I guess that makes a lot of sense if most of your types are either frozen or are "algorithm" helpers, then these semantics are workable. I'm still quite worried that the runtime crashes will trip up users, but I guess maybe it's better than silently blocking and degrading throughput. Still, I think that a lot of PyO3 users might not be so careful in designing their types!

👍 thanks all for the agreement, it seems like we at least have a plan for 0.23 and a rough roadmap of where we're going after that.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally read the code & test - very concise change in the end. I have just a few thoughts, and then once we've discussed those points let's move ahead with a merge. Thanks!

pytests/src/pyclasses.rs Outdated Show resolved Hide resolved
pytests/src/pyclasses.rs Outdated Show resolved Hide resolved
src/pycell/impl_.rs Outdated Show resolved Hide resolved
src/pycell/impl_.rs Outdated Show resolved Hide resolved
src/pycell/impl_.rs Outdated Show resolved Hide resolved
src/pycell/impl_.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've reviewed the replacement without SeqCst and have a different opinion on what the correct implementation is. 🙈

src/pycell/impl_.rs Outdated Show resolved Hide resolved
src/pycell/impl_.rs Outdated Show resolved Hide resolved
@ngoldbaum
Copy link
Contributor Author

Agreed on the reviews, thanks! I missed the SeqCst use in the mutable borrows, I was laser focused on the increment implementation.

@davidhewitt
Copy link
Member

davidhewitt commented Sep 26, 2024

I think still possible changes warranted for the increment case, see #4544 (comment). That proposal will probably also fix the MSRV build.

Also, now that we're venturing more out of my comfort zone with the less strict memory orderings, it would be great to come up with some tests which give us confidence the implementation is correct.

I came up with this one, which fails on main and passes on this branch. Annoyingly it still passes (on my Mac) even if all the atomic orderings are Relaxed, which makes me want to come up with some more tests...

#[test]
fn test_thread_safety() {
    #[pyclass]
    struct MyClass {
        x: u64,
    }

    Python::with_gil(|py| {
        let inst = Py::new(py, MyClass { x: 0 }).unwrap();

        let total_modifications = py.allow_threads(|| {
            std::thread::scope(|s| {
                // Spawn a bunch of threads all racing to write to
                // the same instance of `MyClass`.
                let threads = (0..10)
                    .map(|_| {
                        s.spawn(|| {
                            Python::with_gil(|py| {
                                // Each thread records its own view of how many writes it made
                                let mut local_modifications = 0;
                                for _ in 0..100 {
                                    if let Ok(mut i) = inst.try_borrow_mut(py) {
                                        i.x += 1;
                                        local_modifications += 1;
                                    }
                                }
                                local_modifications
                            })
                        })
                    })
                    .collect::<Vec<_>>();

                // Sum up the total number of writes made by all threads
                threads.into_iter().map(|t| t.join().unwrap()).sum::<u64>()
            })
        });

        // If the implementation is free of data races, the total number of writes
        // should match the final value of `x`.
        assert_eq!(total_modifications, inst.borrow(py).x);
    });
}

// On success the read is synchronized to ensure other
// threads don't get a reference before this thread checks
// that it can get one
Ordering::Acquire,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be AcqRel?

The documentation for compare_exchange states:

Using Acquire as success ordering makes the store part of this operation Relaxed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally convinced it's necessary to establish a happens-before relationship with other threads when they observe the successfully stored flag (as they won't be allowed to read the accompanying data), but I think this also cannot hurt.

@davidhewitt
Copy link
Member

Ok, here's yet another test, which fails if the atomic orderings are all relaxed:

#[test]
fn test_thread_safety_2() {
    struct SyncUnsafeCell<T>(UnsafeCell<T>);
    unsafe impl<T> Sync for SyncUnsafeCell<T> {}

    impl<T> SyncUnsafeCell<T> {
        fn get(&self) -> *mut T {
            self.0.get()
        }
    }

    let data = SyncUnsafeCell(UnsafeCell::new(0));
    let data2 = SyncUnsafeCell(UnsafeCell::new(0));
    let borrow_checker = BorrowChecker(BorrowFlag(AtomicUsize::new(BorrowFlag::UNUSED)));

    std::thread::scope(|s| {
        s.spawn(|| {
            for _ in 0..1_000_000 {
                if borrow_checker.try_borrow_mut().is_ok() {
                    // thread 1 writes to both values during the mutable borrow
                    unsafe { *data.get() += 1 };
                    unsafe { *data2.get() += 1 };
                    borrow_checker.release_borrow_mut();
                }
            }
        });

        s.spawn(|| {
            for _ in 0..1_000_000 {
                if borrow_checker.try_borrow().is_ok() {
                    // if the borrow checker is working correctly, it should be impossible
                    // for thread 2 to observe a difference in the two values
                    assert_eq!(unsafe { *data.get() }, unsafe { *data2.get() });
                    borrow_checker.release_borrow();
                }
            }
        });
    });
}

@ngoldbaum
Copy link
Contributor Author

Added that test, I can also reproduce the failure if I make the orderings relaxed.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work here, I'm now reasonably convinced that what we have here is correct. We might yet dream up new tests later, though I don't think we need to wait for those to merge.

@davidhewitt davidhewitt added this pull request to the merge queue Sep 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 27, 2024
@davidhewitt
Copy link
Member

Ah, we probably need to gate the new tests with #[cfg(not(target_arch = "wasm32"))]. We don't have threading support on that target at the moment in tests.

@davidhewitt davidhewitt added this pull request to the merge queue Oct 5, 2024
Merged via the queue into PyO3:main with commit 8288fb9 Oct 5, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants