Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when working with a disconnected channel #838

Closed
alygin opened this issue May 22, 2022 · 9 comments
Closed

Race condition when working with a disconnected channel #838

alygin opened this issue May 22, 2022 · 9 comments

Comments

@alygin
Copy link
Contributor

alygin commented May 22, 2022

At the moment, the following piece of code contains a race condition if the spawned thread drops the sender before the receiver blocks on recv() (thread sanitizer catches it):

fn race() {
    static mut V: u32 = 0;
    let (s, r) = bounded::<i32>(10);
    let t = thread::spawn(move || {
        unsafe { V = 1 }; // (A)
        drop(s);
    });
    let _ = r.recv().unwrap_err();
    unsafe { V = 2 } // (B)
    t.join().unwrap();
}

The race is only present when using the Array or List flavor, while Zero works fine here. It looks like the reason is that such relaxed loads:

atomic::fence(Ordering::SeqCst);
let tail = self.tail.load(Ordering::Relaxed);

provide no acquire semantics, thus the main thread might not see effect of (A).

The same problem with sending to a disconnected channel.

In earlier versions this was a SeqCst load, but then it was optimized and lost its synchronization property, hence (A) is not guaranteed to happen-before (B).

To me it looks like a bug because channels are usually treated as synchronization objects: std::sync::mpsc is presented this way in Rust Doc (though mpsc::channel has the same flaw), Go guarantees such synchronization (here's the same test for Go channels, it passes thread sanitizer checks).

But maybe this was a conscious choice to sacrifice synchronization for performance in such cases?

@alygin alygin changed the title Race condition when working with disconnected channels Race condition when working with a disconnected channel May 22, 2022
@alygin
Copy link
Contributor Author

alygin commented May 22, 2022

I've ported some tests from Go, both miri and tsan detect races.

@kprotty
Copy link

kprotty commented Jun 4, 2022

In earlier versions this was a SeqCst load, but then it was optimized and lost its synchronization property, hence (A) is not guaranteed to happen-before (B).

As long as there's a Release operation on tail in the drop after (A) (I assume here from disconnect), it should be guaranteed to happen-before (B) due to Atomic-fence synchronization.

This seems more like an issue with TSAN not detecting fences properly; something which even the Rust stdlib specializes for.

@alygin
Copy link
Contributor Author

alygin commented Jun 4, 2022

@kprotty, thanks!

@alygin
Copy link
Contributor Author

alygin commented Jun 4, 2022

BTW, it's not only TSAN. Miri also detects data race here.

@alygin
Copy link
Contributor Author

alygin commented Jun 5, 2022

@kprotty, following your comment in the related miri issue:

After some thought, the crossbeam code is fine to race as drop() isn't necessarily a synchronization point (which the test case was trying to use as such).

I agree that the drop() is not expected to be a sync point per se. But the test is not about dropping, it's about disconnecting, though technically they are the same in this case.

My initial concern was that on disconnection, the receiving side awakes but isn't guaranteed to see effects of the memory changes made by the sending side before disconnecting. This may be an unexpected behavior. For instance, channels in Go provide such synchronization on purpose.

The question is should crossbeam channels guarantee the synchronization on disconnection or not? For the Array flavor it can be easily achieved without performance degradation (just by swapping the fence and the load, or even by moving the fence deeper to the return true-branch). I'm not sure about the List flavor, didn't look into it yet. Zero is already ok.

@taiki-e, @kprotty, what do you think?

@kprotty
Copy link

kprotty commented Jun 5, 2022

IMO, a channel should only ensure synchronization for the data sent and retrieved through the channel's API. Relying on the channel to create happens-before edges with data external to it (disconnect included, as it's a state observation not a data transfer) feels like unspecified behavior and is better guaranteed through explicit/separate synchronization.

@RalfJung
Copy link
Contributor

RalfJung commented Jun 27, 2022

Intuitively I would expect that a disconnect is a signal I send through the channel, and everyone who receives the signal gets a happens-after from me sending the signal, as with usual message passing.

There should be good perf reasons for not having this kind of message-passing semantics.

This seems more like an issue with google/sanitizers#1415; something which even the Rust stdlib specializes for.

Miri supports fences just fine, so that's not it.

@alygin
Copy link
Contributor Author

alygin commented Aug 1, 2022

@RalfJung, actually, channels work as you described. But there's a more subtle case:

What if the receiver gets to the recv() call when the channel has already been closed? The receiver won't wait, it won't receive any message, it'll just continue doing its work. In the example I provided, it'll happen if the spawned thread finishes before the main thread calls recv().

Should we guarantee that the main thread sees the effect of (A)? I think we should, because that makes reasoning about possible execution paths easier and removes data race. It can be implemented without performance penalty. At least for the Array flavour it can be implemented by fixin the flaw in the current synchronization approach (the relaxed load is sequenced after the fence).

@RalfJung
Copy link
Contributor

RalfJung commented Aug 1, 2022

it won't receive any message

It implicitly receives the message that the channel has been closed, doesn't it? It doesn't matter whether it actually had to wait or not, the recv has to "see" that the channel was closed by doing a load, and I would expect that so sync-with the thread that did the closing.

Should we guarantee that the main thread sees the effect of (A)?

In my opinion, definitely yes. We can only get to B if A already happened, so the happens-before edge should be established.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants