Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider implementing ARM64 __load_acquire/__stlr intrinsics #62103

Closed
StephanTLavavej opened this issue Apr 12, 2023 · 3 comments · Fixed by microsoft/STL#4870
Closed

Consider implementing ARM64 __load_acquire/__stlr intrinsics #62103

StephanTLavavej opened this issue Apr 12, 2023 · 3 comments · Fixed by microsoft/STL#4870

Comments

@StephanTLavavej
Copy link
Member

As of VS 2022 17.6 Preview 3, MSVC supports the following ARM64 intrinsics used by its STL:

unsigned __int8  __load_acquire8 (const volatile unsigned __int8  * _Target);
unsigned __int16 __load_acquire16(const volatile unsigned __int16 * _Target);
unsigned __int32 __load_acquire32(const volatile unsigned __int32 * _Target);
unsigned __int64 __load_acquire64(const volatile unsigned __int64 * _Target);

void __stlr8 (volatile unsigned __int8  * _Target, unsigned __int8  _Value);
void __stlr16(volatile unsigned __int16 * _Target, unsigned __int16 _Value);
void __stlr32(volatile unsigned __int32 * _Target, unsigned __int32 _Value);
void __stlr64(volatile unsigned __int64 * _Target, unsigned __int64 _Value);

According to my understanding, the __load_acquire intrinsic emits either the ldar or ldapr instructions (according to criteria that are beyond my cat-sized brain 🐱 🧠), while the __stlr intrinsic emits the stlr instruction. These are significantly more efficient than what was previously possible.

Currently, MSVC's STL is using its classic (slower) codepaths for Clang/LLVM ARM64. It would be nice if Clang added support for the new faster intrinsics.

@llvmbot
Copy link
Member

llvmbot commented Apr 12, 2023

@llvm/issue-subscribers-backend-aarch64

@efriedma-quic
Copy link
Collaborator

If possible, I'd strongly prefer if you could change the MS STL to use __atomic_load_n/__atomic_store_n. It's not clear to me what the semantics of the target-specific intrinsics are supposed to be, and LLVM optimizations already know how to optimize the existing atomic intrinsics.

(ldapr is part of armv8.3, so I assume MSVC won't generate it unless you pass flags that indicate the target supports it.)

compnerd added a commit to compnerd/apple-swift that referenced this issue Jun 22, 2023
MSVC 17.6p3 introduced new ARM64 intrinsics for atomic
(load-acquire/store-release) operations.  Since clang does not support
this yet, force the fallback path to temporarily unblock the build while
we implement support for the `__stlr[8|16|32|64]` intrinsics in clang.

See: llvm/llvm-project#62103
@StephanTLavavej
Copy link
Member Author

Closed as MSVC's STL now uses Clang's __atomic_load_n/__atomic_store_n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants