Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arc::strong_count memory ordering is a potential footgun #117485

Open
lukas-code opened this issue Nov 1, 2023 · 21 comments
Open

Arc::strong_count memory ordering is a potential footgun #117485

lukas-code opened this issue Nov 1, 2023 · 21 comments
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@lukas-code
Copy link
Member

In #115546 the memory ordering of {Arc,Weak}::{strong,weak}_count got changed to Relaxed, which can cause some hard to debug race bugs.

I tried this code:

use std::cell::UnsafeCell;
use std::sync::Arc;
use std::thread;

struct SendMe<T>(T);
unsafe impl<T> Send for SendMe<T> {}

fn main() {
    let a = Arc::new(UnsafeCell::new(0i32));
    let b = SendMe(a.clone());
    thread::spawn(move || {
        let b = b;
        for _ in 0..100 {
            unsafe { *b.0.get() += 1; }
        }
    });
    while Arc::strong_count(&a) != 1 {}
    // core::sync::atomic::fence(core::sync::atomic::Ordering::Acquire);
    unsafe { *a.get() += 1; }
}

I expected to see this happen: I expected the check strong_count == 0 to be strong enough to avoid a data race in this program. My intuition was that this check implies that no other thread can access the data concurrently.

Instead, this happened: With the fence commented out, the above program has undefined behavior. Miri outputs:

error: Undefined Behavior: Data race detected between (1) Write on thread `<unnamed>` and (2) Read on thread `main` at alloc826+0x10. (2) just happened here
  --> src/main.rs:19:14
   |
19 |     unsafe { *a.get() += 1; }
   |              ^^^^^^^^^^^^^ Data race detected between (1) Write on thread `<unnamed>` and (2) Read on thread `main` at alloc826+0x10. (2) just happened here
   |
help: and (1) occurred earlier here
  --> src/main.rs:14:22
   |
14 |             unsafe { *b.0.get() += 1; }
   |                      ^^^^^^^^^^^^^^^
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE (of the first span):
   = note: inside `main` at src/main.rs:19:14: 19:27

I think we should either put the Acquire ordering back or add a warning to the docs that such code requires a manual fence.

Meta

1.74 = current beta

cc @SUPERCILEX author of #115546

@rustbot label T-libs A-atomic

@lukas-code lukas-code added the C-bug Category: This is a bug. label Nov 1, 2023
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-atomic Area: Atomics, barriers, and sync primitives T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 1, 2023
@SUPERCILEX
Copy link
Contributor

This can definitely be added to the docs, but it was intentional. The argument is that you shouldn't be synchronizing on the strong_count (you can't write a correct if strong == 1 { finalize } else { drop } since two threads could end up in drop which means you need busy waiting to synchronize I think), so you're left with cases where you don't need Aquire semantics (under the assumption that busy waiting is bad). If it turns out many people are doing busy waiting maybe we'd want to revert?


I know your code is probably convoluted for an example, but I'd argue you should use unsafe { *Arc::into_inner(a).unwrap().get() += 1; }. Or even better if you don't need to do the last action on the starting thread, use into_inner on both threads and let whoever comes last do the final action.

@lukas-code
Copy link
Member Author

lukas-code commented Nov 2, 2023

My use case was something like if strong == 1 { unsynchornized access } else { synchronized access } as a performance optimization, where an acquire load would be sufficient synchronization. I realized now that get_mut can be used in this situation, however that method uses three atomic accesses whereas only one would be required if it is known that no Weak references exist.
Essentially, I'm trying to implement a get_mut_assume_no_weak.

Regarding busy waiting, the code would still be wrong even if we used some kind of synchronized waiting inside inside the loop (example), since the final relaxed load and the non-atomic access after can be reordered. To fix the code, we need a barrier between the final Arc::strong_count and the non-atomic access.

@SUPERCILEX
Copy link
Contributor

Essentially, I'm trying to implement a get_mut_assume_no_weak.

Hmmm, this makes me wonder if there should be an arc without weak support. We'd need to check what other methods have their performance hindered by weak references.


Overall, I think the use case makes sense, but having counts be relaxed is more flexible. I'd maybe argue using a fence is clearer code too. My main concern would be performance but as far as I understand a good microarchitecture won't have any trouble with an acquire fence.

@saethlin saethlin added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. C-bug Category: This is a bug. labels Nov 18, 2023
@alekspickle
Copy link

alekspickle commented Mar 18, 2024

This change has actually broken one of our older services that was getting a long overdue update and was relying on strong_count. I've spent days trying to figure out the reason refusing to believe Arc was responsible.
With Ordering::Relaxed the behavior became incredibly volatile.
I'll try to create MVE but not sure it'll be possible since I was able to catch this only on loaded prod instance.

@mgeier
Copy link
Contributor

mgeier commented Mar 31, 2024

Just to add another data point: the change to Relaxed broke a function in my wait-free ring buffer crate rtrb: mgeier/rtrb#114.

I'm not using that function myself, and I don't know if anyone is, so it might not be a problem at all.

I was planning to replace Arc with some hand-written alternative anyway, so now I have a good reason to prioritize that.

@SUPERCILEX
Copy link
Contributor

Would docs suggesting an aquire fence if the count is equal to 1 help? (Assuming unsynchronized reads at that point.)

The is_abandoned() function seems like a good example of where this change can lead to better performance (since you only need a fence when the other thread is dead which will happen once as opposed to a bunch of times when it's not dead).

@mgeier
Copy link
Contributor

mgeier commented Mar 31, 2024

Would docs suggesting an aquire fence if the count is equal to 1 help?

I'm not very familiar with the use of fences, so I'm not sure ...

Can you please clarify what you mean?

In my case, the count is only ever 2 or 1. If it's 1, it means that the other thread has been abandoned.
If I'm getting the value 1 in an unsynchronized way, it has in fact already been synchronized, otherwise I wouldn't have gotten 1.
If I'm getting the value 2 in an unsynchronized way, it could mean anything.

I tried putting an Acquire fence into my implementation of is_abandoned(), but ThreadSanitizer still reports a data race in my test (https://github.com/mgeier/rtrb/blob/2686a28351db711ebb43c5d0fab671d9ed47a5a7/tests/lib.rs#L141-L156). But maybe I'm using it wrong.

Admittedly, I'm using an Arc for something that can manually be implemented in a simpler way, which I would like to do in the long term, anyway.

@mgeier
Copy link
Contributor

mgeier commented Mar 31, 2024

I just said:

ThreadSanitizer still reports a data race

But then I stumbled upon some notes I made earlier (mgeier/rtrb#75 (comment)) which are talking about the ThreadSanitizer creating false positives when using fences, see also #65097.

But this would make it hard for me to know if I'm using the fence correctly ... and is there a fallback implementation for when using ThreadSanitizer?

@SUPERCILEX
Copy link
Contributor

I trust miri:

diff --git a/tests/lib.rs b/tests/lib.rs
index 9afd61d..7abf893 100644
--- a/tests/lib.rs
+++ b/tests/lib.rs
@@ -146,7 +146,9 @@ fn no_race_with_is_abandoned() {
         unsafe { V = 10 };
         drop(p);
     });
+    while !c.is_abandoned() {}
     if c.is_abandoned() {
+        std::sync::atomic::fence(std::sync::atomic::Ordering::Acquire);
         unsafe { V = 20 };
     } else {
         // This is not synchronized, both Miri and ThreadSanitizer should detect a data race:

If you want you can move the fence inside is_abandoned, but I still feel like it's the user's responsibility to think about the memory model when writing stuff like this rather than hope it's taken care of.

Also the code above is still "wrong" in that it relies on this line from Arc:

if self.inner().strong.fetch_sub(1, Release) != 1 {
. Though I find it unlikely that it would be changed so probably nothing to worry about.

To see this move the drop before the unsafe and you'll get UB again (which can't be fixed with fences because now there's no condition variable to spin on and guarantee the fence gets executed).

@mgeier
Copy link
Contributor

mgeier commented Apr 1, 2024

Thanks for the clarification!

which can't be fixed with fences because now there's no condition variable to spin on and guarantee the fence gets executed

That's good to know. However, I would be interested in not spinning. I would also like to generally be able to put something into the else clause in my test.

I trust miri

But miri didn't detect the data race in the first place!
It was only detected (quite unreliably) by ThreadSanitizer.

I still feel like it's the user's responsibility to think about the memory model when writing stuff like this rather than hope it's taken care of.

In case of the rtrb crate I don't think I agree. All functions like push() and pop() synchronize between threads. I'm not explicitly stating that in the docs, but I think I would expect it implicitly.
And I don't see why is_abandoned() should behave differently.

But anyway, that's something I can implement without Arc at a later time.

@SUPERCILEX
Copy link
Contributor

However, I would be interested in not spinning.

The fence will work fine without spinning, but you'll never get true for the if statement using miri without it. Try my patch with miri and it will fail if you comment out the fence.

In case of the rtrb crate I don't think I agree. All functions like push() and pop() synchronize between threads. I'm not explicitly stating that in the docs, but I think I would expect it implicitly.

For push and pop, absolutely. But agree to disagree on the flag. Now of course if you document the flag as explicitly providing some synchronization, then sure.

@mgeier
Copy link
Contributor

mgeier commented May 2, 2024

The fence will work fine without spinning,

That's good to know!

I'm trusting you, but there is currently no tool that would allow to check that, right?

Miri seems to not detect the lack of the fence, and ThreadSanitizer seems to produce false positives when it is there.

but you'll never get true for the if statement using miri without it.

OK, but is this a shortcoming of Miri (which might be fixed some day) or is this intentional?

Try my patch with miri and it will fail if you comment out the fence.

Yes, thanks, that behaves as expected, but I'd be interested in testing the non-spinning use case.

@SUPERCILEX
Copy link
Contributor

I'm trusting you

You shouldn't. 😉

there is currently no tool that would allow to check that, right?

Don't think so, no. But you can replace the dependency spinning with this:

    for _ in 0..100 {
        std::hint::black_box(());
    }

and use the -Zmiri-preemption-rate=1 to get the race condition to trigger. This is pretty finicky because if miri is working correctly it'll be constantly switching back and forth between the drop code in the other thread and this loop. 100 seems to be enough switching to finish running the other thread's code. I also threw a panic in the else case to check that is_abandoned actually ran.

I think this is a pretty reasonable model of other people's code: I know there's a for loop in there which suggests spinning, but we're not touching rtrb code and it's a constant time loop.

OK, but is this a shortcoming of Miri (which might be fixed some day) or is this intentional?

Yeah, Miri (unlike tools like loom) is an interpreter so only runs through a single execution path.

I'd be interested in testing the non-spinning use case.

I filed rust-lang/miri#3538. We should be able to get rid of the loop and have a single std::thread::yield_now() work.

@SUPERCILEX
Copy link
Contributor

Ok we already have the solution from the other issue: slap a yield_now in there and run with MIRIFLAGS='-Zmiri-backtrace=full -Zmiri-preemption-rate=0 -Zmiri-seed=1' cargo miri t -- no_race_with_is_abandoned to trigger UB. (And of course put the fence in to fix it.)

@RalfJung
Copy link
Member

RalfJung commented May 3, 2024

@lukas-code did Miri originally find this issue? We're always looking for things to add to the "issues found by Miri" list :)

(And of course put the fence in to fix it.)

Note that you should test with a bunch of seeds to be sure of the fix, since evidently not every seed reproduces the issue.

@lukas-code
Copy link
Member Author

@lukas-code did Miri originally find this issue? We're always looking for things to add to the "issues found by Miri" list :)

Yes it was found by miri! Unfortunately I don't have anything good to put on "issues found by Miri" list. The code originally came from an university assignment where we had to implement a SPSC and after finding out about this problem, I rewrote my code entirely to not rely on strong_count at all. If you're curious I've uploaded it here (TW: bad code).

Interestingly, mgeier/rtrb#114 looks like it's the exact same issue as I had, but in a real SPSC: Unsynchronized access to the shared data by the reader after the writer has been dropped.

@mgeier
Copy link
Contributor

mgeier commented May 25, 2024

Ok we already have the solution from the other issue: slap a yield_now in there and run with MIRIFLAGS='-Zmiri-backtrace=full -Zmiri-preemption-rate=0 -Zmiri-seed=1' cargo miri t -- no_race_with_is_abandoned to trigger UB.

Thanks for all your help @SUPERCILEX!

However, when I try it locally on my computer, Miri doesn't detect the data race with -Zmiri-seed=1, but it does detect it with -Zmiri-seed=0.

I tried all values from 0 to 255 as suggested in the Miri docs, and those values do detect the data race:

0 2 3 5 6 7 8 9 12 19 20 21 24 26 28 29 30 32 35 39 41 44 47 51 53 58 59 60 61 63 66 67 68 69 72 73 74 76 77 79 80 81 82 84 85 90 91 92 94 95 99 101 104 105 106 113 117 120 121 123 125 127 130 131 138 141 144 147 149 150 151 152 153 154 155 158 159 162 163 164 169 170 171 173 174 176 178 182 183 185 191 192 194 196 199 200 201 202 205 206 207 214 217 219 220 221 222 224 226 228 231 232 237 239 241 245 247 254 255

So what should I do in my tests?
Should I test a certain range of seeds? If yes, which?

And should I use -Zmiri-preemption-rate=0 for all my tests or just the one in question?
Or should I run all my tests both with and without -Zmiri-preemption-rate=0?

@RalfJung
Copy link
Member

RalfJung commented May 26, 2024

When concurrency is involved, bugs can often only be found probabilistically, and that applies to Miri as well. There's no general answer for which flags have the highest chance of finding any particular bug, it depends on your code. When there's a 50% chance of finding a bug, around half the seeds will hit it; which exact seeds can find any particular bug changes with each nightly version and even when you update completely unrelated dependencies.

One common thing to do is run the tests in a loop inside the code. Ideally you should use new atomic variables for each loop iteration (i.e., not just the same static over and over again). That's very similar to running the test with multiple seeds.

One day cargo miri test may have a flag to run the test multiple times (Cc rust-lang/miri#3546), but that would run all tests multiple times which can be quite slow.

@mgeier
Copy link
Contributor

mgeier commented May 26, 2024

Thanks @RalfJung for the explanations, that's very helpful!

One common thing to do is run the tests in a loop inside the code. [...] That's very similar to running the test with multiple seeds.

I wasn't aware of that!

In case anyone is interested, I have updated my tests according to the recommendations: mgeier/rtrb#125

@RalfJung
Copy link
Member

RalfJung commented May 26, 2024

And should I use -Zmiri-preemption-rate=0 for all my tests or just the one in question?

I would in general recommend not setting this flag for concurrency tests -- it makes tests more reproducible, but it also makes some issues entirely impossible to find. So for some bugs the probability they can be found goes up without preemption, but for other bugs the probability then goes to 0.

@briansmith
Copy link
Contributor

I think this should be closed in favor of #126239. We should document what synchronization guarantees each construct has, but it's unreasonably to put warnings in all the documentation for every construct where users might have assumed (probably by peeking into the implementation and seeing Acquire/Release/SeqCst/etc. orderings being used) something that isn't guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants