Consider implementing ARM64 `__load_acquire`/`__stlr` intrinsics #62103

StephanTLavavej · 2023-04-12T18:43:07Z

As of VS 2022 17.6 Preview 3, MSVC supports the following ARM64 intrinsics used by its STL:

unsigned __int8  __load_acquire8 (const volatile unsigned __int8  * _Target);
unsigned __int16 __load_acquire16(const volatile unsigned __int16 * _Target);
unsigned __int32 __load_acquire32(const volatile unsigned __int32 * _Target);
unsigned __int64 __load_acquire64(const volatile unsigned __int64 * _Target);

void __stlr8 (volatile unsigned __int8  * _Target, unsigned __int8  _Value);
void __stlr16(volatile unsigned __int16 * _Target, unsigned __int16 _Value);
void __stlr32(volatile unsigned __int32 * _Target, unsigned __int32 _Value);
void __stlr64(volatile unsigned __int64 * _Target, unsigned __int64 _Value);

According to my understanding, the __load_acquire intrinsic emits either the ldar or ldapr instructions (according to criteria that are beyond my cat-sized brain 🐱 🧠), while the __stlr intrinsic emits the stlr instruction. These are significantly more efficient than what was previously possible.

Currently, MSVC's STL is using its classic (slower) codepaths for Clang/LLVM ARM64. It would be nice if Clang added support for the new faster intrinsics.

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-04-12T18:46:24Z

@llvm/issue-subscribers-backend-aarch64

efriedma-quic · 2023-04-12T20:14:25Z

If possible, I'd strongly prefer if you could change the MS STL to use __atomic_load_n/__atomic_store_n. It's not clear to me what the semantics of the target-specific intrinsics are supposed to be, and LLVM optimizations already know how to optimize the existing atomic intrinsics.

(ldapr is part of armv8.3, so I assume MSVC won't generate it unless you pass flags that indicate the target supports it.)

MSVC 17.6p3 introduced new ARM64 intrinsics for atomic (load-acquire/store-release) operations. Since clang does not support this yet, force the fallback path to temporarily unblock the build while we implement support for the `__stlr[8|16|32|64]` intrinsics in clang. See: llvm/llvm-project#62103

StephanTLavavej · 2024-07-31T03:51:39Z

Closed as MSVC's STL now uses Clang's __atomic_load_n/__atomic_store_n.

github-actions bot added the new issue label Apr 12, 2023

EugeneZelenko added backend:AArch64 and removed new issue labels Apr 12, 2023

StephanTLavavej mentioned this issue Apr 12, 2023

Toolset update: VS 2022 17.6 Preview 3 microsoft/STL#3651

Merged

StephanTLavavej mentioned this issue Apr 18, 2023

<atomic>: Consider using __atomic_load_n/__atomic_store_n for Clang microsoft/STL#3659

Open

StephanTLavavej mentioned this issue Feb 19, 2024

Clang/LLVM tracking issue microsoft/STL#4413

Open

efriedma-quic mentioned this issue Jun 26, 2024

Switch clang-cl's atomic reads and writes to match what the latest MSVC generates #96679

Closed

StephanTLavavej mentioned this issue Jul 30, 2024

Improve ARM64 atomics for Clang microsoft/STL#4870

Merged

StephanTLavavej closed this as completed in microsoft/STL#4870 Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider implementing ARM64 `__load_acquire`/`__stlr` intrinsics #62103

Consider implementing ARM64 `__load_acquire`/`__stlr` intrinsics #62103

StephanTLavavej commented Apr 12, 2023

llvmbot commented Apr 12, 2023

efriedma-quic commented Apr 12, 2023

StephanTLavavej commented Jul 31, 2024

Consider implementing ARM64 __load_acquire/__stlr intrinsics #62103

Consider implementing ARM64 __load_acquire/__stlr intrinsics #62103

Comments

StephanTLavavej commented Apr 12, 2023

llvmbot commented Apr 12, 2023

efriedma-quic commented Apr 12, 2023

StephanTLavavej commented Jul 31, 2024

Consider implementing ARM64 `__load_acquire`/`__stlr` intrinsics #62103

Consider implementing ARM64 `__load_acquire`/`__stlr` intrinsics #62103