-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJit] Stackspace reservation is for pre-optimized code #7279
Comments
@dotnet/jit-contrib |
Considering for 2.1. |
This doesn't seem to repro anymore (from CHK jit in REL build):
Frame size of 40 is what one would expect from a non-leaf with no locals (8 bytes for align, 32 for out arg area). Not sure what changed things, maybe struct promotion kicking in now where it didn't before? |
2 field Span vs 3 field Span? Was before the runtime has the two field Span so was using the three field Span. Will see if can repo |
Still does it using a three field "SpanLike" StackSpace.cs though interestingly dotnet/coreclr#9068 and dotnet/coreclr#9066 have switched output StackSpace.asm |
This sample now does a bunch of redundant
; Lcl frame size = 40
G_M5322_IG01:
57 push rdi
56 push rsi
4883EC28 sub rsp, 40
G_M5322_IG02:
4885C9 test rcx, rcx
0F84AD020000 je G_M5322_IG36
G_M5322_IG03:
8B7108 mov esi, dword ptr [rcx+8]
488BF9 mov rdi, rcx
48B9E852780EF97F0000 mov rcx, 0x7FF90E7852E8
33D2 xor edx, edx
E88AFAAD5F call CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
488B05637AEEFF mov rax, qword ptr [reloc classVar[0xe788360]]
83FE01 cmp esi, 1
0F828D020000 jb G_M5322_IG37
G_M5322_IG04:
48FFC0 inc rax
8D4EFF lea ecx, [rsi-1]
8BF1 mov esi, ecx
8BCE mov ecx, esi
8BF1 mov esi, ecx
83FE01 cmp esi, 1
0F8282020000 jb G_M5322_IG38
G_M5322_IG05:
48FFC0 inc rax
8D4EFF lea ecx, [rsi-1]
8BF1 mov esi, ecx
8BCE mov ecx, esi
8BF1 mov esi, ecx
83FE01 cmp esi, 1
0F8277020000 jb G_M5322_IG39
|
The register shuffling could be something similar to dotnet/coreclr#11390. |
Just to make current status clear... First example above is now "good": ; Lcl frame size = 40
G_M24727_IG01:
4883EC28 sub rsp, 40
G_M24727_IG02:
4885C9 test rcx, rcx
0F84BC010000 je G_M24727_IG35
G_M24727_IG03:
488D4110 lea rax, bword ptr [rcx+16]
8B4908 mov ecx, dword ptr [rcx+8]
83F901 cmp ecx, 1
0F82B6010000 jb G_M24727_IG36
... In the second example stack space is also contained, but register shuffling is still happening, even after dotnet/coreclr#16028. Will investigate. |
The root issues are in the complexities of LSRA preferencing -- does not look like there is an easy surgical fix and any larger change here will be quite disruptive. So moving out of 2.1. |
Here's a kind of similar example: public static ReadOnlySpan<char> Y(string s) => s.AsReadOnlySpan(); generates a chain of span copies, which the jit can't coalesce: G_M47024_IG01:
G_M47024_IG02:
4885D2 test rdx, rdx
7507 jne SHORT G_M47024_IG03
33C0 xor rax, rax
4533C0 xor r8d, r8d
EB1A jmp SHORT G_M47024_IG04
G_M47024_IG03:
488D420C lea rax, bword ptr [rdx+12]
4C8BC0 mov r8, rax
8B5208 mov edx, dword ptr [rdx+8]
8BC2 mov eax, edx
498BD0 mov rdx, r8
4C8BC2 mov r8, rdx
8BD0 mov edx, eax
498BC0 mov rax, r8
448BC2 mov r8d, edx
G_M47024_IG04:
488901 mov bword ptr [rcx], rax
44894108 mov dword ptr [rcx+8], r8d
488BC1 mov rax, rcx
G_M47024_IG05:
C3 ret |
The register shuffling may have been fixed e.g. by dotnet/coreclr/pull/19429 |
Confirmed the stack trace size and register shuffling for summary code is fixed: 7279.asm |
Optimized code reserves stack space for pre-optimized stack
(See also https://github.com/dotnet/coreclr/issues/9068 which I imagine should produce identical output; but produces very different output)
e.g. The following code reserves 1000 bytes of stack even though it only uses registers
Code
Asm produced
category:cq
theme:register-allocator
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: