Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite atomics section #378

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
0347b01
Write “Multithreaded Execution” and add simplified atomic spec
SabrinaJewson Aug 4, 2022
42f46d2
Fix one broken link
SabrinaJewson Aug 4, 2022
46f31ae
Replace accidental rs code blocks with rust
SabrinaJewson Aug 4, 2022
103a733
Replace reads with explicit `println!`s
SabrinaJewson Aug 4, 2022
a26eab4
Write the “Relaxed” section
SabrinaJewson Aug 5, 2022
d01fb66
Remove specification chapter
SabrinaJewson Aug 5, 2022
715e67f
Write about `Acquire` and `Release`
SabrinaJewson Aug 14, 2022
afe0ee2
Write the `SeqCst` section
SabrinaJewson Aug 28, 2022
c1129e3
“happens before” → “happens-before”
SabrinaJewson Aug 28, 2022
b896399
Introduce synchronizes-with terminology
SabrinaJewson Aug 28, 2022
59fde6f
Use “coherence” terminology from the start
SabrinaJewson Aug 28, 2022
6dc3d54
Remove old sections and introduce “AM” in intro
SabrinaJewson Aug 28, 2022
29707ee
“isomorphic” → “functionally equivalent”
SabrinaJewson Aug 28, 2022
52d5d13
Define the term “race condition”
SabrinaJewson Aug 28, 2022
40b06fe
Add note about duplication of `1` in M.O.
SabrinaJewson Aug 28, 2022
390754b
Explain the ABA problem
SabrinaJewson Aug 28, 2022
493c671
Dispel the myth that RMWs “see the latest value”
SabrinaJewson Aug 28, 2022
dc6a942
Explain the C++20 release sequence changes
SabrinaJewson Aug 28, 2022
b3c2e62
Explain the Abstract Machine
SabrinaJewson Aug 28, 2022
a9eb1f6
Improve the explanations of coherence
SabrinaJewson Aug 28, 2022
8068390
Show the final correct execution in mutex example
SabrinaJewson Aug 28, 2022
3c76e35
Add a more formal explanation of happens-before
SabrinaJewson Aug 28, 2022
d4f8f47
Write about acquire and release fences
SabrinaJewson Aug 29, 2022
5e27ed5
Improve the `SeqCst` explanation
SabrinaJewson Sep 4, 2022
805070e
Write about `SeqCst` fences
SabrinaJewson Sep 4, 2022
c19184a
Fix Unicode art incorrectly interpreted as Rust code
SabrinaJewson Sep 4, 2022
2384caa
Define “unsequenced” early on
SabrinaJewson Sep 4, 2022
d9dabf4
Note that a release fence followed by multiple stores is not necessar…
SabrinaJewson Sep 4, 2022
09c428e
Remove the signals section for now
SabrinaJewson Oct 1, 2022
ff32f70
Fix CI
SabrinaJewson Nov 4, 2022
af524a5
Fix typos
SabrinaJewson Jan 12, 2024
f3277bf
Mention that Rust atomics correspond to `atomic_ref`
SabrinaJewson Jan 12, 2024
6d16ea5
Explain the terms “strongly/weakly-ordered hardware”
SabrinaJewson Jan 12, 2024
19b059a
Merge branch 'master' into atomics
SabrinaJewson Jan 12, 2024
b139a3c
Simplify SeqCst demonstration, and remove incorrect claim
SabrinaJewson Mar 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,13 @@
* [Concurrency](concurrency.md)
* [Races](races.md)
* [Send and Sync](send-and-sync.md)
* [Atomics](atomics.md)
* [Atomics](./atomics/atomics.md)
* [Multithreaded Execution](./atomics/multithread.md)
* [Relaxed](./atomics/relaxed.md)
* [Acquire and Release](./atomics/acquire-release.md)
* [SeqCst](./atomics/seqcst.md)
* [Fences](./atomics/fences.md)
* [Signals](./atomics/signals.md)
* [Implementing Vec](./vec/vec.md)
* [Layout](./vec/vec-layout.md)
* [Allocating](./vec/vec-alloc.md)
Expand Down
2 changes: 1 addition & 1 deletion src/arc-mutex/arc-clone.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ happens-before relationship but is atomic. When `Drop`ping the Arc, however,
we'll need to atomically synchronize when decrementing the reference count. This
is described more in [the section on the `Drop` implementation for
`Arc`](arc-drop.md). For more information on atomic relationships and Relaxed
ordering, see [the section on atomics](../atomics.md).
ordering, see [the section on atomics](../atomics/atomics.md).

Thus, the code becomes this:

Expand Down
334 changes: 334 additions & 0 deletions src/atomics/acquire-release.md

Large diffs are not rendered by default.

26 changes: 20 additions & 6 deletions src/atomics.md → src/atomics/atomics.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,24 @@ details, you should check out the [C++ specification][C++-model].
Still, we'll try to cover the basics and some of the problems Rust developers
face.

The C++ memory model is fundamentally about trying to bridge the gap between the
semantics we want, the optimizations compilers want, and the inconsistent chaos
our hardware wants. *We* would like to just write programs and have them do
exactly what we said but, you know, fast. Wouldn't that be great?
## Motivation

## Compiler Reordering
The C++ memory model is very large and confusing with lots of seemingly
arbitrary design decisions. To understand the motivation behind this, it can
help to look at what got us in this situation in the first place. There are
three main factors at play here:

1. Users of the language, who want fast, cross-platform code;
2. compilers, who want to optimize code to make it fast;
3. and the hardware, which is ready to unleash a wrath of inconsistent chaos on
your program at a moment's notice.

The C++ memory model is fundamentally about trying to bridge the gap between
these three, allowing users to write code for a logical and consistent abstract
machine while the compiler and hardware deal with the madness underneath that
makes it run fast.

### Compiler Reordering

Compilers fundamentally want to be able to do all sorts of complicated
transformations to reduce data dependencies and eliminate dead code. In
Expand Down Expand Up @@ -53,7 +65,7 @@ able to make these kinds of optimizations, because they can seriously improve
performance. On the other hand, we'd also like to be able to depend on our
program *doing the thing we said*.

## Hardware Reordering
### Hardware Reordering

On the other hand, even if the compiler totally understood what we wanted and
respected our wishes, our hardware might instead get us in trouble. Trouble
Expand Down Expand Up @@ -106,6 +118,8 @@ programming:
incorrect. If possible, concurrent algorithms should be tested on
weakly-ordered hardware.

---

## Data Accesses

The C++ memory model attempts to bridge the gap by allowing us to talk about the
Expand Down
1 change: 1 addition & 0 deletions src/atomics/fences.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Fences
220 changes: 220 additions & 0 deletions src/atomics/multithread.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Multithreaded Execution

The first important thing to understand about C++20 atomics is that **the
abstract machine has no concept of time**. You might expect there to be a single
SabrinaJewson marked this conversation as resolved.
Show resolved Hide resolved
global ordering of events across the program where each happens at the same time
or one after the other, but under the abstract model no such ordering exists;
instead, a possible execution of the program must be treated as a single event
that happens instantaneously — there is never any such thing as “now”, or a
“latest value”, and using that terminology will only lead you to more confusion.
(Of course, in reality there does exist a concept of time, but you must keep in
mind that you’re not programming for the hardware, you’re programming for the
AM.)
SabrinaJewson marked this conversation as resolved.
Show resolved Hide resolved

However, while no global ordering of operations exists _between_ threads, there
does exist a single total ordering _within_ each thread, which is known as its
_sequence_. For example, given this simple Rust program:

```rust
println!("A");
println!("B");
```

its sequence during one possible execution can be visualized like so:

```text
╭───────────────╮
│ println!("A") │
╰───────╥───────╯
╭───────⇓───────╮
│ println!("B") │
╰───────────────╯
```

That double arrow in between the two boxes (`⇒`) represents that the second
statement is _sequenced after_ the first (and similarly the first statement is
_sequenced before_ the second). This is the strongest kind of ordering guarantee
between any two operations, and only comes about when those two operations
happen one after the other and on the same thread.

If we add a second thread to the mix:

```rust
// Thread 1:
println!("A");
println!("B");
// Thread 2:
eprintln!("01");
eprintln!("02");
```

it will simply coexist in parallel, with each thread getting its own independent
sequence:

```text
Thread 1 Thread 2
╭───────────────╮ ╭─────────────────╮
│ println!("A") │ │ eprintln!("01") │
╰───────╥───────╯ ╰────────╥────────╯
╭───────⇓───────╮ ╭────────⇓────────╮
│ println!("B") │ │ eprintln!("02") │
╰───────────────╯ ╰─────────────────╯
```

Note that this is **not** a representation of multiple things that _could_
happen at runtime — instead, this diagram describes exactly what _did_ happen
when the program ran once. This distinction is key, because it highlights that
even the lowest-level representation of a program’s execution does not have
a global ordering between threads; those two disconnected chains are all there
is.

Now let’s make things more interesting by introducing some shared data, and have
both threads read it.

```rust
// Initial state
let data = 0;
// Thread 1:
println!("{data}");
// Thread 2:
eprintln!("{data}");
```

Each memory location, similarly to threads, can be shown as another column on
our diagram, but holding values instead of instructions, and each access (read
or write) manifests as a line from the instruction that performed the access to
the associated value in the column. So this code can produce (and is in fact
guaranteed to produce) the following execution:

```text
Thread 1 data Thread 2
╭──────╮ ┌────┐ ╭──────╮
│ data ├╌╌╌╌┤ 0 ├╌╌╌╌┤ data │
╰──────╯ └────┘ ╰──────╯
```

That is, both threads read the same value of `0` from `data`, with no relative
ordering between them. This is the simple case, for when the data doesn’t ever
change — but that’s no fun, so let’s add some mutability in the mix (we’ll also
return to a single thread, just to keep things simple).

Consider this code, which we’re going to attempt to draw a diagram for like
above:

```rust
let mut data = 0;
data = 1;
println!("{data}");
data = 2;
```

Working out executions of code like this is rather like solving a Sudoku puzzle:
you must first lay out all the facts that you know, and then fill in the blanks
with logical reasoning. The initial information we’ve been given is both the
initial value of `data` and the sequential order of Thread 1; we also know that
over its lifetime, `data` takes on a total of three different values that were
caused by two different non-atomic writes. This allows us to start drawing out
some boxes:

```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌? │ 0 │
╰───╥───╯ ?╌┼╌╌╌╌┤
╭───⇓───╮ ?╌┼╌╌╌╌┤
│ data ├╌? │ ? │
╰───╥───╯ ?╌┼╌╌╌╌┤
╭───⇓───╮ ?╌┼╌╌╌╌┤
│ = 2 ├╌? │ ? │
╰───────╯ └────┘
```

Note the use of dashed padding in between the values of `data`’s column. Those
spaces won’t ever contain a value, but they’re used to represent an
unsynchronized (non-atomic) write — it is garbage data and attempting to read it
would result in a data race.
SabrinaJewson marked this conversation as resolved.
Show resolved Hide resolved

To solve this puzzle, we first need to bring in a new rule that governs all
memory accesses to a particular location:
> From the point at which the access occurs, find every other point that can be
> reached by following the reverse direction of arrows, then for each one of
> those, take a single step across every line that connects to the relevant
> memory location. **It is not allowed for the access to read or write any value
> that appears above any one of these points**.

In our case, there are two potential executions: one, where the first write
corresponds to the first value in `data`, and two, where the first write
corresponds to the second value in `data`. Considering the second case for a
moment, it would also force the second write to correspond to the first
value in `data`. Therefore its diagram would look something like this:

```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ┊ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ┊ ├╌╌┼╌╌╌╌┤
│ data ├╌?┊ ┊ │ 2 │
╰───╥───╯ ├╌┼╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌┼╌╌┼╌╌╌╌┤
│ = 2 ├╌╌╌╌┘ │ 1 │
╰───────╯ └────┘
```

However, that second line breaks the rule we just established! Following up the
arrows from the third operation in Thread 1, we reach the first operation, and
from there we can take a single step to reach the space in between the `2` and
the `1`, which excludes the this access from writing any value above that point.

So evidently, this execution is no good. We can therefore conclude that the only
possible execution of this program is the other one, in which the `1` appears
above the `2`:

```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ├╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌╌┼╌╌╌╌┤
│ data ├╌? │ 1 │
╰───╥───╯ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ├╌╌┼╌╌╌╌┤
│ = 2 ├╌╌┘ │ 2 │
╰───────╯ └────┘
```

Now to sort out the read operation in the middle. We can use the same rule as
before to trace up to the first write and rule out us reading either the `0`
value or the garbage that exists between it and `1`, but how to we choose
SabrinaJewson marked this conversation as resolved.
Show resolved Hide resolved
between the `1` and the `2`? Well, as it turns out there is a complement to the
rule we already defined which gives us the exact answer we need:

> From the point at which the access occurs, find every other point that can be
> reached by following the _forward_ direction of arrows, then for each one of
> those, take a single step across every line that connects to the relevant
> memory location. **It is not allowed for the access to read or write any value
> that appears below any one of these points**.

Using this rule, we can follow the arrow downwards and then across and finally
rule out `2` as well as the garbage before it. This leaves us with exactly _one_
value that the read operation can return, and exactly one possible execution
guaranteed by the Abstract Machine:

```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ├╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌╌┼╌╌╌╌┤
│ data ├╌╌╌╌╌┤ 1 │
╰───╥───╯ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ├╌╌┼╌╌╌╌┤
│ = 2 ├╌╌┘ │ 2 │
╰───────╯ └────┘
```

You might be thinking that all this has been is the longest, most convoluted
explanation ever of the most basic intuitive semantics of programming — and
you’d be absolutely right. But it’s essential to grasp these fundamentals,
because once you have this model in mind, the extension into multiple threads
and the complicated semantics of real atomics becomes completely natural.
SabrinaJewson marked this conversation as resolved.
Show resolved Hide resolved
Loading