Add memory barriers on lock/unlock of SpinLock#13050
Merged
straight-shoota merged 1 commit intocrystal-lang:masterfrom Feb 8, 2023
Merged
Add memory barriers on lock/unlock of SpinLock#13050straight-shoota merged 1 commit intocrystal-lang:masterfrom
straight-shoota merged 1 commit intocrystal-lang:masterfrom
Conversation
straight-shoota
approved these changes
Feb 7, 2023
Member
Author
|
Until released, this fix can be applied as a monkey-patch as: class Crystal::SpinLock
def lock
previous_def
::Atomic::Ops.fence :sequentially_consistent, false
end
def unlock
::Atomic::Ops.fence :sequentially_consistent, false
previous_def
end
end |
ggiraldez
approved these changes
Feb 7, 2023
jgaskins
added a commit
to jgaskins/crystal
that referenced
this pull request
Jan 30, 2024
This solution is the same as the one used in crystal-lang#13050. The following code is expected to output `1000000` preceded by the time it took to perform it: ``` mutex = Mutex.new numbers = Array(Int32).new(initial_capacity: 1_000_000) done = Channel(Nil).new concurrency = 20 iterations = 1_000_000 // concurrency concurrency.times do spawn do iterations.times { mutex.synchronize { numbers << 0 } } ensure done.send nil end end start = Time.monotonic concurrency.times { done.receive } print Time.monotonic - start print ' ' sleep 100.milliseconds # Wait just a bit longer to be sure the discrepancy isn't due to a *different* race condition pp numbers.size ``` Before this commit, on an Apple M1 CPU, the array size would be anywhere from 880k-970k, but I never observed it reach 1M. Here is a sample: ``` $ repeat 20 (CRYSTAL_WORKERS=10 ./mutex_check) 00:00:00.119271625 881352 00:00:00.111249083 936709 00:00:00.102355208 946428 00:00:00.116415166 926724 00:00:00.127152583 899899 00:00:00.097160792 964577 00:00:00.120564958 930859 00:00:00.122803000 917583 00:00:00.093986834 954112 00:00:00.079212333 967772 00:00:00.093168208 953491 00:00:00.102553834 962147 00:00:00.091601625 967304 00:00:00.108157208 954855 00:00:00.080879666 944870 00:00:00.114638042 930429 00:00:00.093617083 956496 00:00:00.112108959 940205 00:00:00.092837875 944993 00:00:00.097882625 916220 ``` This indicates that some of the mutex locks were getting through when they should not have been. With this commit, using the exact same parameters (built with `--release -Dpreview_mt` and run with `CRYSTAL_WORKERS=10` to spread out across all 10 cores) these are the results I'm seeing: ``` 00:00:00.078898166 1000000 00:00:00.072308084 1000000 00:00:00.047157000 1000000 00:00:00.088043834 1000000 00:00:00.060784625 1000000 00:00:00.067710250 1000000 00:00:00.081070750 1000000 00:00:00.065572208 1000000 00:00:00.065006958 1000000 00:00:00.061041541 1000000 00:00:00.059648291 1000000 00:00:00.078100125 1000000 00:00:00.050676250 1000000 00:00:00.049395875 1000000 00:00:00.069352334 1000000 00:00:00.063897833 1000000 00:00:00.067534333 1000000 00:00:00.070290833 1000000 00:00:00.067361500 1000000 00:00:00.078021833 1000000 ``` Note that it's not only correct, but also significantly faster.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #13010. Credits goes to @ggiraldez
Examples provided in #13010 (comment) and #13010 (comment) works in aarch64-apple-darwin without problem so far.
Something I was thinking is that maybe we can use a more relaxed ordering since we are now emitting explicit memory barriers. So not using Atomic which enforces
sequentially_consistentin the#swap,#getused in the SpinLock. But that can come later if needed. I am not sure if that would lead to performance improvement at all.