Skip to content

Add memory barriers on lock/unlock of SpinLock#13050

Merged
straight-shoota merged 1 commit intocrystal-lang:masterfrom
bcardiff:fix/spin-lock-aarch64
Feb 8, 2023
Merged

Add memory barriers on lock/unlock of SpinLock#13050
straight-shoota merged 1 commit intocrystal-lang:masterfrom
bcardiff:fix/spin-lock-aarch64

Conversation

@bcardiff
Copy link
Member

@bcardiff bcardiff commented Feb 7, 2023

Fixes #13010. Credits goes to @ggiraldez

Examples provided in #13010 (comment) and #13010 (comment) works in aarch64-apple-darwin without problem so far.

Something I was thinking is that maybe we can use a more relaxed ordering since we are now emitting explicit memory barriers. So not using Atomic which enforces sequentially_consistent in the #swap, #get used in the SpinLock. But that can come later if needed. I am not sure if that would lead to performance improvement at all.

@Blacksmoke16 Blacksmoke16 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency topic:multithreading platform:aarch64 labels Feb 7, 2023
@straight-shoota straight-shoota added this to the 1.8.0 milestone Feb 7, 2023
@bcardiff
Copy link
Member Author

bcardiff commented Feb 7, 2023

Until released, this fix can be applied as a monkey-patch as:

class Crystal::SpinLock
  def lock
    previous_def
    ::Atomic::Ops.fence :sequentially_consistent, false 
  end

  def unlock
    ::Atomic::Ops.fence :sequentially_consistent, false 
    previous_def
  end
end

@straight-shoota straight-shoota merged commit 2d49b07 into crystal-lang:master Feb 8, 2023
jgaskins added a commit to jgaskins/crystal that referenced this pull request Jan 30, 2024
This solution is the same as the one used in crystal-lang#13050.

The following code is expected to output `1000000` preceded by the time
it took to perform it:

```
mutex = Mutex.new
numbers = Array(Int32).new(initial_capacity: 1_000_000)
done = Channel(Nil).new
concurrency = 20
iterations = 1_000_000 // concurrency
concurrency.times do
  spawn do
    iterations.times { mutex.synchronize { numbers << 0 } }
  ensure
    done.send nil
  end
end

start = Time.monotonic
concurrency.times { done.receive }
print Time.monotonic - start
print ' '
sleep 100.milliseconds # Wait just a bit longer to be sure the discrepancy isn't due to a *different* race condition
pp numbers.size
```

Before this commit, on an Apple M1 CPU, the array size would be anywhere
from 880k-970k, but I never observed it reach 1M. Here is a sample:

```
$ repeat 20 (CRYSTAL_WORKERS=10 ./mutex_check)
00:00:00.119271625 881352
00:00:00.111249083 936709
00:00:00.102355208 946428
00:00:00.116415166 926724
00:00:00.127152583 899899
00:00:00.097160792 964577
00:00:00.120564958 930859
00:00:00.122803000 917583
00:00:00.093986834 954112
00:00:00.079212333 967772
00:00:00.093168208 953491
00:00:00.102553834 962147
00:00:00.091601625 967304
00:00:00.108157208 954855
00:00:00.080879666 944870
00:00:00.114638042 930429
00:00:00.093617083 956496
00:00:00.112108959 940205
00:00:00.092837875 944993
00:00:00.097882625 916220
```

This indicates that some of the mutex locks were getting through when
they should not have been. With this commit, using the exact same
parameters (built with `--release -Dpreview_mt` and run with
`CRYSTAL_WORKERS=10` to spread out across all 10 cores) these are the
results I'm seeing:

```
00:00:00.078898166 1000000
00:00:00.072308084 1000000
00:00:00.047157000 1000000
00:00:00.088043834 1000000
00:00:00.060784625 1000000
00:00:00.067710250 1000000
00:00:00.081070750 1000000
00:00:00.065572208 1000000
00:00:00.065006958 1000000
00:00:00.061041541 1000000
00:00:00.059648291 1000000
00:00:00.078100125 1000000
00:00:00.050676250 1000000
00:00:00.049395875 1000000
00:00:00.069352334 1000000
00:00:00.063897833 1000000
00:00:00.067534333 1000000
00:00:00.070290833 1000000
00:00:00.067361500 1000000
00:00:00.078021833 1000000
```

Note that it's not only correct, but also significantly faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:aarch64 topic:multithreading topic:stdlib:concurrency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in atomic operations on aarch64 with multi-threading

4 participants