-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<locale>
: Double-checked locking for locale::classic
#3048
<locale>
: Double-checked locking for locale::classic
#3048
Conversation
That's not a good idea to have DCL via CAS. On x86 and x64 it is an expensive operation. Still a win over a mutex, as a mutex would yeld two such ops, but still need to have plain read, if want real DCL gain. |
I don't see a justification to use WinAPI / Intrinsics. If you can change the locking tech, you may go with If you need to preserve the variable, you can still use |
On the original issue, @CaseyCarter noted (via @StephanTLavavej):
|
Yes, I'm not a fan of the CAS. I think Lines 155 to 167 in fef8191
Lines 1036 to 1045 in fef8191
Lines 97 to 107 in fef8191
|
For x86/x64 -- yes, absolutely.
My understanding is that we should primarily optimize for x86 / x64, and don't have much divergence between these and ARM, so should go with a barrier plus a load. |
I see, but is there a problem with |
Thanks! We're very interested in reviewing this performance enhancement, but because locales are by far the scariest area of the STL, we'd like to target merging this at the beginning of the VS 2022 17.5 Preview 1 cycle (so we have time to react to any bug reports), instead of the end of the 17.4 Preview 3 cycle. Fortunately, this won't require waiting long - the date when changes start flowing into 17.5 Preview 1 is 2022-09-02 (i.e. next week Friday). |
Not necessarily, but (a) some rough tests show it's about 2x slower on my machine (3.85 ns vs 1.70 ns for copy-pasting the atomic load/store code; always-locking code is 16.7 ns) and (b) there's some business in |
Well, that's expected. I assumed it to be uniform for x86/ARM, and faster than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears correct to me after looking at it for about an hour.
Yes, Ideally, we would use LL/SC on ARM but that can wait until we actually have LL/SC in msvc and implement it for atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the issue with xatomic not being core I'm OK with taking this, and doing any possible unification with atomic as a follow-up.
This is the third copy of this code we have though (there's another copy in vcruntime), which is a little annoying.
FYI @barcharcraz, we've made changes since you approved.
Thanks @MattStephanson for this major improvement in a notoriously problematic area! 🎉 |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
🎉 😻 ⏱️ 🚀 |
Closes #3030.
The original issue suggests double-checked locking of
global_locale
inlocale::_Init
, but I don't think that would work without significantly more work, if it's possible at all. The reason is that the global locale can be modified, while at the same time a default constructedlocale
could be trying to load the global locale and increment it's refcount. This is the kind of lock-free "juggling with knives" that I don't want to even try. The classic locale, on the other hand, is a typical singleton and more amenable to DCL.ABI alert:
InterlockedMeowPointer
requires their arguments be 32/64 bit aligned, thus the addedalignas
. Is that an ABI issue?