-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync/atomic: Pointer 4 times peformance degradation based on layout of structs #67764
Comments
More info on it here. |
I get different behavior on my machine.
My guess is this has a lot to do with how allocations land within cache lines and whether any false sharing ensues. It might be instructive to make sure that The difference in speed goes away with |
I had a not as bad regression:
Adding a cache-line-pad fixed it: type Mutex[T any] struct {
mut sync.RWMutex
+ _ [1]byte
val atomic.Pointer[T]
}
You might need a different size of array based on your CPU model or how the winds are blowing. A micro benchmark like this is extremely narrow and doesn't capture accurately the complete cost of I don't think we want to fix anything here, adding extra padding is extremely runtime-edge-case-specific, makes structs bigger (slower to alloc, use more memory). I think it's fine for users to fix alignment issues, see the discussion to use |
@randall77 , @Jorropo , thanks a lot for looking at it. @Jorropo, could you please point me to the discussion about using |
I don't have links on top of my head, it was in slack or discord. type Mutex[T any] struct {
mut sync.RWMutex
_ structs.Align[32]
val atomic.Pointer[T]
} altho the question of In the meanwhile you can use |
In triage, we don't think there's anything for us to do here on the Go side. Closing optimistically, feel free to comment if you disagree. |
@Jorropo , @mknyszek , @randall77, one last question, just to make sure that issue is understood in the full. |
Anything that might change the layout of data in memory can affect the performance. Adding a field or removing a field to any object that is allocated could in theory trigger slowness (or fastness). That said, I don't think we have proof that it is the cause. It just seems likely. If it is the case, there's nothing that Go can do about it - that's a property of the hardware. Careful alignment to avoid false sharing is probably the only solution. If it isn't the case, we'd need to see some evidence. Probably something showing where the time is going using the chip performance counters (cache misses, evictions, that sort of thing). |
@randall77 , thanks for clarification, can I ask you not to close this issue at least for two weeks, I will try to get more data, if it is a false sharing issue this data can contribute to solution. |
The issue is closed, but it isn't deleted. Feel free to reopen if you have more data that points to an actionable change Go can make. |
Go version
go version go1.22.3 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Consider following snippet
By some wierd reason this code gives 5x performance degradation in parallel
Mutex.Read
workload comparing to following code:Or even to this code:
code:
In short, somehow existance of
[] string
andi int
in aState
in conjunction with havingsync.RWMutex
onMutex
gives 5x pefromance degradation in parallelatomic.Pointer.Load
workload.If you to add variable to either
State
orMutex
or replacesync.RWMutex
withsync.Mutex
this perfomance degradation vanishes.This test produced consistent results:
What did you see happen?
Performance of
atomic.Pointer.Load()
degradesWhat did you expect to see?
No performance degradation4
The text was updated successfully, but these errors were encountered: