Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeAOT codegen optimization opportunities #64242

Closed
12 of 15 tasks
MichalStrehovsky opened this issue Jan 25, 2022 · 13 comments
Closed
12 of 15 tasks

NativeAOT codegen optimization opportunities #64242

MichalStrehovsky opened this issue Jan 25, 2022 · 13 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Jan 25, 2022

(Note this is milestoned as Future on purpose - none of this is blocking.)

A couple relatively low hanging optimization opportunities in the JIT space.

  • Enable loop alignment
    Loop alignment seems to have some pretty nice benefits. NativeAOT/crossgen2 will already respect requests to align at 32-byte boundaries (Add support in crossgen2 for 32-byte alignment #32602), but RyuJIT never requests that for prejit. RyuJIT seems to also have assumptions that the code buffer to write instructions into is 32-byte aligned. It's a managed byte[] in our AOT compilers and it's hard to align that. RyuJIT should ideally count bytes from the beginning of the buffer.
  • Optimize static field access
    Static fields always go through a helper call. It can be done better. Improve static field access for NativeAOT #63620 is a WIP pull request. I don't know when I'll get back to it.
  • Optimize class constructor checks - NativeAOT: Inline static initialization checks in codegen #80954
    If there's a static constructor, the EE side will ask RyuJIT to run the class constructor check (CORINFO_INITCLASS_USE_HELPER). This is a helper call that checks whether the class constructor already ran and if not, runs it. The helper looks like this:
    // We need to trigger the cctor before returning the base. It is stored at the beginning of the non-GC statics region.
    encoder.EmitLEAQ(encoder.TargetRegister.Arg0, factory.TypeNonGCStaticsSymbol(target), -NonGCStaticsNode.GetClassConstructorContextSize(factory.Target));
    AddrMode initialized = new AddrMode(encoder.TargetRegister.Arg0, null, factory.Target.PointerSize, 0, AddrModeSize.Int32);
    encoder.EmitCMP(ref initialized, 1);
    encoder.EmitRETIfEqual();
    encoder.EmitMOV(encoder.TargetRegister.Arg1, encoder.TargetRegister.Result);
    encoder.EmitJMP(factory.HelperEntrypoint(HelperEntrypoint.EnsureClassConstructorRunAndReturnNonGCStaticBase));
    . We could inline the fast path into RyuJIT codegen (compare a dereferenced IConHandle with 1 and if they're equal, the cctor already executed and we can move on). This would probably require a new JitInteface method that provides info about lookup for the IConHandle and lookup for the helper to call if the IConHandle is not 1. There already is a pull request that does pretty much that: RyuJIT: Inlined "is class statically inited" check #47901. We need to revive it for NativeAOT.
  • Optimize thread static field access [NativeAOT] Inline access to thread statics #79521
    Emit the platform-specific native sequence to access thread statics inline instead of calling a helper.
  • Reading preinitialized data JIT: import static readonly fields holding frozen objects as const handles #76112
    NativeAOT can run static constructors at compile time and provide values of readonly static fields. RyuJIT can already consume that (see code around CORINFO_FLG_FIELD_FINAL), but in doing so it assumes the address returned from getFieldAddress is an actual address that can be dereferenced. It's a handle when precompiling. We would want to introduce a proper JitInterface API for this. The static data preinitializer can return more than just primitive types - we could also make this work for reference types (we can return a handle that points to an object in the frozen data segment). Might be better to look into this after "Optimize static field access" above is done.
  • Devirtualization
    The compiler driver has a pretty good idea of what types will be allocated over the lifetime of the app. It can provide answers to questions such as "what are all the types that could implement this interface". This can be used to drive devirtualization. Some of it doesn't really even have to be guarded (there might only be a single possibility, for example).
  • General optimizations

Cc @EgorBo - might be in your area of interest

category:cq
theme:ready-to-run
skill-level:expert
cost:medium
impact:medium

@MichalStrehovsky MichalStrehovsky added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 25, 2022
@MichalStrehovsky MichalStrehovsky added this to the Future milestone Jan 25, 2022
@ghost
Copy link

ghost commented Jan 25, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

(Note this is milestoned as Future on purpose - none of this is blocking.)

A couple relatively low hanging optimization opportunities in the JIT space.

  • Enable loop alignment
    Loop alignment seems to have some pretty nice benefits. NativeAOT/crossgen2 will already respect requests to align at 32-byte boundaries (Add support in crossgen2 for 32-byte alignment #32602), but RyuJIT never requests that for prejit. RyuJIT seems to also have assumptions that the code buffer to write instructions into is 32-byte aligned. It's a managed byte[] in our AOT compilers and it's hard to align that. RyuJIT should ideally count bytes from the beginning of the buffer.
  • Optimize static field access
    Static fields always go through a helper call. It can be done better. Improve static field access for NativeAOT #63620 is a WIP pull request. I don't know when I'll get back to it.
  • Optimize class constructor checks
    If there's a static constructor, the EE side will ask RyuJIT to run the class constructor check (CORINFO_INITCLASS_USE_HELPER). This is a helper call that checks whether the class constructor already ran and if not, runs it. The helper looks like this:
    // We need to trigger the cctor before returning the base. It is stored at the beginning of the non-GC statics region.
    encoder.EmitLEAQ(encoder.TargetRegister.Arg0, factory.TypeNonGCStaticsSymbol(target), -NonGCStaticsNode.GetClassConstructorContextSize(factory.Target));
    AddrMode initialized = new AddrMode(encoder.TargetRegister.Arg0, null, factory.Target.PointerSize, 0, AddrModeSize.Int32);
    encoder.EmitCMP(ref initialized, 1);
    encoder.EmitRETIfEqual();
    encoder.EmitMOV(encoder.TargetRegister.Arg1, encoder.TargetRegister.Result);
    encoder.EmitJMP(factory.HelperEntrypoint(HelperEntrypoint.EnsureClassConstructorRunAndReturnNonGCStaticBase));
    . We could inline the fast path into RyuJIT codegen (compare a dereferenced IConHandle with 1 and if they're equal, the cctor already executed and we can move on). This would probably require a new JitInteface method that provides info about lookup for the IConHandle and lookup for the helper to call if the IConHandle is not 1.
  • Reading preinitialized data
    NativeAOT can run static constructors at compile time and provide values of readonly static fields. RyuJIT can already consume that (see code around CORINFO_FLG_FIELD_FINAL), but in doing so it assumes the address returned from getFieldAddress is an actual address that can be dereferenced. It's a handle when precompiling. We would want to introduce a proper JitInterface API for this. The static data preinitializer can return more than just primitive types - we could also make this work for reference types (we can return a handle that points to an object in the frozen data segment). Might be better to look into this after "Optimize static field access" above is done.

Cc @EgorBo - might be in your area of interest

Author: MichalStrehovsky
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: Future

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 25, 2022
@omariom
Copy link
Contributor

omariom commented Jan 25, 2022

NativeAOT can run static constructors at compile time and provide values of readonly static fields.

What are the requirements for that?
Initialization with constants?

@MichalStrehovsky
Copy link
Member Author

MichalStrehovsky commented Jan 25, 2022

What are the requirements for that?
Initialization with constants?

On a high level, the restrictions are around allocating reference types with GC-pointer fields and p/invokes.

The interpreter lives here: https://github.com/dotnet/runtime/blob/a5cf724aa2c9a67eb45827b56d6315acd4edea84/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/TypePreinit.cs

The tests for this paint a pretty good picture of what's supported: https://github.com/dotnet/runtime/blob/a5cf724aa2c9a67eb45827b56d6315acd4edea84/src/tests/nativeaot/SmokeTests/Preinitialization/Preinitialization.cs (look for Assert.IsPreinitialized - those are expected to have their cctor executed at compile time). Some of those tests are testing Assert.IsLazyInitialized, which is the opposite, testing that we couldn't run the cctor at compile time).

The coolest demo is probably:

class TestDrawCircle
{
static class CircleHolder
{
public static readonly byte[] s_bytes;
static CircleHolder()
{
s_bytes = ComputeCircleBytes();
}
}
private static byte[] ComputeCircleBytes()
{
const int Width = 16;
byte[] bytes = new byte[Width * Width];
for (int i = 0; i < bytes.Length; i++)
{
int x = i % Width;
int y = i / Width;
x -= Width / 2;
y -= Width / 2;
if (x * x + y * y < (Width / 2) * (Width / 2))
bytes[i] = (byte)'*';
}
return bytes;
}
public static void Run()
{
Assert.IsPreinitialized(typeof(CircleHolder));
byte[] expected = ComputeCircleBytes();
byte[] actual = CircleHolder.s_bytes;
Assert.AreEqual(expected.Length, actual.Length);
for (int i = 0; i < expected.Length; i++)
{
Assert.AreEqual(expected[i], actual[i]);
}
}
}

Which compiles to an actual circle in the .data segment of the executable file:

image

@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib.

@EgorBo
Copy link
Member

EgorBo commented Jan 25, 2022

@MichalStrehovsky I can definitely help! For loop alignment we need help of @kunalspathak

Devirtualization
The compiler driver has a pretty good idea of what types will be allocated over the lifetime of the app. It can provide answers to questions such as "what are all the types that could implement this interface". This can be used to drive devirtualization. Some of it doesn't really even have to be guarded (there might only be a single possibility, for example).

Isn't it better to run an ILLink substep and devirtualize everything there? E.g. Xamarin.iOS does that https://github.com/xamarin/xamarin-macios/blob/a20d417bf794445aac19a5b07c5db5a70d16dde5/tools/linker/MonoTouch.Tuner/SealerSubStep.cs#L11-L13

@MichalStrehovsky
Copy link
Member Author

Isn't it better to run an ILLink substep and devirtualize everything there? E.g. Xamarin.iOS does that https://github.com/xamarin/xamarin-macios/blob/a20d417bf794445aac19a5b07c5db5a70d16dde5/tools/linker/MonoTouch.Tuner/SealerSubStep.cs#L11-L13

That just puts sealed on classes that don't have it and nothing derives from them. NativeAOT does that too:

public override bool IsEffectivelySealed(TypeDesc type)
{
// If we know we scanned a type that derives from this one, this for sure can't be reported as sealed.
TypeDesc canonType = type.ConvertToCanonForm(CanonicalFormKind.Specific);
if (_unsealedTypes.Contains(canonType))
return false;
// Don't report __Canon as sealed or it can cause trouble
// (E.g. RyuJIT might think it's okay to omit array element type checks for __Canon[].)
if (type.IsCanonicalDefinitionType(CanonicalFormKind.Any))
return false;
if (type is MetadataType metadataType)
{
// Due to how the compiler is structured, we might see "constructed" EETypes for things
// that never got allocated (doing a typeof() on a class that is otherwise never used is
// a good example of when that happens). This can put us into a position where we could
// report `sealed` on an `abstract` class, but that doesn't lead to anything good.
return !metadataType.IsAbstract;
}
// Everything else can be considered sealed.
return true;
}

Doing that helps for classes (not interfaces), but we can do better if we can hook this into the PGO-driven devirtualization mechanism in RyuJIT: it can handle things like "this interface is implemented by only two types" or "this class is derived by two types". I looked into actually feeding this through the PGO JitInterface methods (by having the driver fake PGO data), but it looked it would be messy and this can actually have better codegen than PGO (because no fallback virtual call is needed - the list of classes is complete).

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 25, 2022
@kunalspathak
Copy link
Member

@MichalStrehovsky - Happy to help for loop alignment. Let me know if you have any questions.

@EgorBo
Copy link
Member

EgorBo commented Jan 26, 2022

it can handle things like "this interface is implemented by only two types" or "this class is derived by two types"

Great idea! I have a PR that expands GDV to handle multiple cases so I guess it shouldn't be difficult to do + also we basically need to implement getLikelyClasses API in runtime then

@AndyAyersMS
Copy link
Member

Early on we were going to have the runtime track which classes implemented an interface so it could report that to the jit and we could use that to guess if we did not have PGO. But the bookkeeping got messy.

For the AOT case we should add some extra info to the getLikelyClasses to indicate that no other classes are possible, and use that to prune off the "or else" case in GDV.

@SupinePandora43
Copy link

SupinePandora43 commented Feb 26, 2022

I initialize program in static T() constructor because people in #lowlevel said that readonly fields aren't handled as constants.

@hez2010
Copy link
Contributor

hez2010 commented Jun 13, 2022

readonly fields aren't handled as constants.

readonly fields aren't, but readonly static fields are.

@JulieLeeMSFT
Copy link
Member

cc @kunalspathak to capture JIT work.

@MichalStrehovsky
Copy link
Member Author

TLS access and classes with few derived classes are the only checkboxes left but we did that work. I think this is fixed 🎉

@github-actions github-actions bot locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Status: Done
Development

No branches or pull requests

8 participants