Race condition when working with a disconnected channel #838

alygin · 2022-05-22T18:32:17Z

At the moment, the following piece of code contains a race condition if the spawned thread drops the sender before the receiver blocks on recv() (thread sanitizer catches it):

fn race() {
    static mut V: u32 = 0;
    let (s, r) = bounded::<i32>(10);
    let t = thread::spawn(move || {
        unsafe { V = 1 }; // (A)
        drop(s);
    });
    let _ = r.recv().unwrap_err();
    unsafe { V = 2 } // (B)
    t.join().unwrap();
}

The race is only present when using the Array or List flavor, while Zero works fine here. It looks like the reason is that such relaxed loads:

crossbeam/crossbeam-channel/src/flavors/array.rs

Lines 276 to 277 in 80224bc

    
           atomic::fence(Ordering::SeqCst); 
        
           let tail = self.tail.load(Ordering::Relaxed);

provide no acquire semantics, thus the main thread might not see effect of (A).

The same problem with sending to a disconnected channel.

In earlier versions this was a SeqCst load, but then it was optimized and lost its synchronization property, hence (A) is not guaranteed to happen-before (B).

To me it looks like a bug because channels are usually treated as synchronization objects: std::sync::mpsc is presented this way in Rust Doc (though mpsc::channel has the same flaw), Go guarantees such synchronization (here's the same test for Go channels, it passes thread sanitizer checks).

But maybe this was a conscious choice to sacrifice synchronization for performance in such cases?

The text was updated successfully, but these errors were encountered:

alygin · 2022-05-22T19:20:05Z

I've ported some tests from Go, both miri and tsan detect races.

kprotty · 2022-06-04T17:13:21Z

In earlier versions this was a SeqCst load, but then it was optimized and lost its synchronization property, hence (A) is not guaranteed to happen-before (B).

As long as there's a Release operation on tail in the drop after (A) (I assume here from disconnect), it should be guaranteed to happen-before (B) due to Atomic-fence synchronization.

This seems more like an issue with TSAN not detecting fences properly; something which even the Rust stdlib specializes for.

alygin · 2022-06-04T18:36:39Z

@kprotty, thanks!

alygin · 2022-06-04T20:20:30Z

BTW, it's not only TSAN. Miri also detects data race here.

alygin · 2022-06-05T18:13:01Z

@kprotty, following your comment in the related miri issue:

After some thought, the crossbeam code is fine to race as drop() isn't necessarily a synchronization point (which the test case was trying to use as such).

I agree that the drop() is not expected to be a sync point per se. But the test is not about dropping, it's about disconnecting, though technically they are the same in this case.

My initial concern was that on disconnection, the receiving side awakes but isn't guaranteed to see effects of the memory changes made by the sending side before disconnecting. This may be an unexpected behavior. For instance, channels in Go provide such synchronization on purpose.

The question is should crossbeam channels guarantee the synchronization on disconnection or not? For the Array flavor it can be easily achieved without performance degradation (just by swapping the fence and the load, or even by moving the fence deeper to the return true-branch). I'm not sure about the List flavor, didn't look into it yet. Zero is already ok.

@taiki-e, @kprotty, what do you think?

kprotty · 2022-06-05T23:28:10Z

IMO, a channel should only ensure synchronization for the data sent and retrieved through the channel's API. Relying on the channel to create happens-before edges with data external to it (disconnect included, as it's a state observation not a data transfer) feels like unspecified behavior and is better guaranteed through explicit/separate synchronization.

RalfJung · 2022-06-27T22:31:00Z

Intuitively I would expect that a disconnect is a signal I send through the channel, and everyone who receives the signal gets a happens-after from me sending the signal, as with usual message passing.

There should be good perf reasons for not having this kind of message-passing semantics.

This seems more like an issue with google/sanitizers#1415; something which even the Rust stdlib specializes for.

Miri supports fences just fine, so that's not it.

alygin · 2022-08-01T17:19:03Z

@RalfJung, actually, channels work as you described. But there's a more subtle case:

What if the receiver gets to the recv() call when the channel has already been closed? The receiver won't wait, it won't receive any message, it'll just continue doing its work. In the example I provided, it'll happen if the spawned thread finishes before the main thread calls recv().

Should we guarantee that the main thread sees the effect of (A)? I think we should, because that makes reasoning about possible execution paths easier and removes data race. It can be implemented without performance penalty. At least for the Array flavour it can be implemented by fixin the flaw in the current synchronization approach (the relaxed load is sequenced after the fence).

RalfJung · 2022-08-01T23:13:37Z

it won't receive any message

It implicitly receives the message that the channel has been closed, doesn't it? It doesn't matter whether it actually had to wait or not, the recv has to "see" that the channel was closed by doing a load, and I would expect that so sync-with the thread that did the closing.

Should we guarantee that the main thread sees the effect of (A)?

In my opinion, definitely yes. We can only get to B if A already happened, so the happens-before edge should be established.

alygin changed the title ~~Race condition when working with disconnected channels~~ Race condition when working with a disconnected channel May 22, 2022

alygin mentioned this issue May 22, 2022

Port golang tests from chan_test.go #839

Closed

alygin mentioned this issue Jun 4, 2022

Merge crossbeam-channel into std::sync::mpsc rust-lang/rust#93563

Merged

alygin closed this as completed Jun 4, 2022

mgeier mentioned this issue Jun 4, 2022

Add test for data race when using is_abandoned() mgeier/rtrb#87

Merged

saethlin mentioned this issue Jun 5, 2022

Miri false positive data race due to incorrect handling of fences rust-lang/miri#2192

Closed

cbeuw mentioned this issue Jun 27, 2022

[Merged by Bors] - unpin nightly and disable weak memory emulation bevyengine/bevy#4988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when working with a disconnected channel #838

Race condition when working with a disconnected channel #838

alygin commented May 22, 2022 •

edited

Loading

alygin commented May 22, 2022 •

edited

Loading

kprotty commented Jun 4, 2022

alygin commented Jun 4, 2022 •

edited

Loading

alygin commented Jun 4, 2022

alygin commented Jun 5, 2022

kprotty commented Jun 5, 2022

RalfJung commented Jun 27, 2022 •

edited

Loading

alygin commented Aug 1, 2022

RalfJung commented Aug 1, 2022

Race condition when working with a disconnected channel #838

Race condition when working with a disconnected channel #838

Comments

alygin commented May 22, 2022 • edited Loading

alygin commented May 22, 2022 • edited Loading

kprotty commented Jun 4, 2022

alygin commented Jun 4, 2022 • edited Loading

alygin commented Jun 4, 2022

alygin commented Jun 5, 2022

kprotty commented Jun 5, 2022

RalfJung commented Jun 27, 2022 • edited Loading

alygin commented Aug 1, 2022

RalfJung commented Aug 1, 2022

alygin commented May 22, 2022 •

edited

Loading

alygin commented May 22, 2022 •

edited

Loading

alygin commented Jun 4, 2022 •

edited

Loading

RalfJung commented Jun 27, 2022 •

edited

Loading