-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"native" instruction set alias for AOT compilers #73246
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
I think the most maintainable way might be to extract the CPU flag detection from the runtime: runtime/src/coreclr/nativeaot/Runtime/startup.cpp Lines 162 to 305 in cdf21f1
Into a place that can be shared with the JitInterface native library: Then compile that into jitinterface.dll (that ships with ILC) and p/invoke into this. We already have managed definitions of the various flags this returns because the computed values are bitmasked with compile-time expectations burned into the produced executable to ensure we don't run on machines that don't have expected CPU features. As a stretch goal, we might try to unify this detection with what's in CoreCLR VM, but that might be too much extra scope. Extracting something that would be eligible to be placed under src/native/minipal in the repo would be a very good first step towards that. |
Performance hit was noticed when testing Native AOT gRPC app on Linux ARM. Compared to a minor perf hit of AOT on Linux Intel: Probably culprit is EventSource methods that use Interlocked to increment longs: |
@JamesNK that makes sense, NativeAOT uses arm64 8.0 as a baseline while atomic instructions require 8.1, so you need to define
for crank I think a while ago we discussed about a named instruction set for Azure (to include the baseline instructions) |
Yes, that fixed it. Before: 239,492 RPS Also, only using Interlocked when required with this gRPC PR - grpc/grpc-dotnet#2052 - will improve performance in the benchmark. |
@JamesNK If the effect this large then may be the hottest counters should be placed on their own cache lines? |
This allows compiling for the ISA extensions that the currently running CPU supports. Fixes dotnet#73246.
It would match the native architecture of the processor on which publishing happens.
Context:
The text was updated successfully, but these errors were encountered: