Fix fence on non-x86 arch and miri #16

taiki-e · 2022-07-17T13:26:08Z

The problem seems to be that the original author of this code confused fence in the x86 hardware memory model with atomic fence in the C++ memory model. (On x86, lock cmpxchg; mov (load from memory) is fine. See also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. On C++ memory model and many architectures, fence for load should be load; fence)

Fixes bevyengine/bevy#5164
FYI @cbeuw

taiki-e · 2022-07-17T13:33:27Z

At least crossbeam and event-listener also have the same issue, but fixing them is probably more complex...

sbarral · 2022-07-19T13:29:10Z

I feel a bit uncomfortable with this commit.

Admittedly, I don't know what exactly is the role of the fence here. This fence does not exist in Dmitry Vyukov's original implementation of the queue, so I guess it was added as part of the modifications that ensure that this queue is linearisable (unlike the original queue).

That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock operation instead of an mfence should also keep the load before.

I did look at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html but could not see where it states that lock + mov (in this order) is equivalent to mov + mfence. In fact, the latest GCC does use the lock optimization and definitely preserves the order, i.e. mov + lock (see this godbolt: https://godbolt.org/z/o3rYdTvYv).

This reverts commit 54df36a.

RalfJung · 2022-07-26T21:24:08Z

src/lib.rs

@@ -461,7 +464,11 @@ fn full_fence() {
        // x86 platforms is going to optimize this away.
        let a = AtomicUsize::new(0);
        let _ = a.compare_exchange(0, 1, Ordering::SeqCst, Ordering::SeqCst);
+        // On x86, `lock cmpxchg; mov` is fine. See also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
+        load_op()


FWIW, this is still Rust code -- so if Miri complains when running this branch of the code (which I suspect it will, since a SC RMW before a load cannot replace a fence after a load), then this code is still wrong.

When you write Rust code, the hardware memory model is all but irrelevant for program correctness. Only the Rust memory model counts.

EDIT: Oh I see this got reverted in #18.

RalfJung · 2022-07-26T21:25:39Z

src/lib.rs

@@ -461,7 +464,11 @@ fn full_fence() {
        // x86 platforms is going to optimize this away.


The fact that you are hoping that "sane" compilers for particular targets are going to treat the memory model differently, is a big red flag. The memory model is target-independent, and a whole bunch of optimizations run on this code (including its use of atomics) before any target-specific concerns are applied.

Inline assembly is the only correct choice here.

EDIT: Oh I see this got reverted in #18.

RalfJung · 2022-07-26T21:33:26Z

That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock operation instead of an mfence should also keep the load before.

I would usually expect that to be the case -- a relaxed load followed by an acquire-or-stronger fence can induce a synchronization edge. But I don't know the context for this particular code.

Does something break, or perf go down badly, if the fence is moved after the load?

Fix fence on non-x86 arch and miri

32f971c

taiki-e merged commit 54df36a into master Jul 17, 2022

taiki-e deleted the fence branch July 17, 2022 13:33

sbarral mentioned this pull request Jul 19, 2022

Fenced load in bounded queue / v1.2.3 #17

Closed

taiki-e added a commit that referenced this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)"

29c59d1

This reverts commit 54df36a.

taiki-e mentioned this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)" #18

Merged

taiki-e added a commit that referenced this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)"

194cf15

This reverts commit 54df36a.

RalfJung reviewed Jul 26, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fence on non-x86 arch and miri #16

Fix fence on non-x86 arch and miri #16

taiki-e commented Jul 17, 2022 •

edited

Loading

taiki-e commented Jul 17, 2022

sbarral commented Jul 19, 2022

RalfJung Jul 26, 2022 •

edited

Loading

RalfJung Jul 26, 2022 •

edited

Loading

RalfJung commented Jul 26, 2022

		@@ -461,7 +464,11 @@ fn full_fence() {
		// x86 platforms is going to optimize this away.

Fix fence on non-x86 arch and miri #16

Fix fence on non-x86 arch and miri #16

Conversation

taiki-e commented Jul 17, 2022 • edited Loading

taiki-e commented Jul 17, 2022

sbarral commented Jul 19, 2022

RalfJung Jul 26, 2022 • edited Loading

Choose a reason for hiding this comment

RalfJung Jul 26, 2022 • edited Loading

Choose a reason for hiding this comment

RalfJung commented Jul 26, 2022

taiki-e commented Jul 17, 2022 •

edited

Loading

RalfJung Jul 26, 2022 •

edited

Loading

RalfJung Jul 26, 2022 •

edited

Loading