Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJit] Stackspace reservation is for pre-optimized code #7279

Closed
benaadams opened this issue Jan 23, 2017 · 13 comments
Closed

[RyuJit] Stackspace reservation is for pre-optimized code #7279

benaadams opened this issue Jan 23, 2017 · 13 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@benaadams
Copy link
Member

Optimized code reserves stack space for pre-optimized stack

(See also https://github.com/dotnet/coreclr/issues/9068 which I imagine should produce identical output; but produces very different output)

e.g. The following code reserves 1000 bytes of stack even though it only uses registers

; Lcl frame size = 1000

G_M27581_IG01:
       4881ECE8030000       sub      rsp, 0x3E8 // 1000 bytes

Code

[MethodImpl(MethodImplOptions.NoInlining)]
private static byte OneSpan(byte[] array)
{
    var span = new Span<byte>(array);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    span = span.Slice(1);
    return span[0];
}

Asm produced

;
; Lcl frame size = 1000

G_M27581_IG01:
       4881ECE8030000       sub      rsp, 0x3E8

G_M27581_IG02:
       4885C9               test     rcx, rcx
       0F8468010000         je       G_M27581_IG36

G_M27581_IG03:
       8B4108               mov      eax, dword ptr [rcx+8]
       83F801               cmp      eax, 1
       0F8263010000         jb       G_M27581_IG37

G_M27581_IG04:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8262010000         jb       G_M27581_IG38

G_M27581_IG05:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8261010000         jb       G_M27581_IG39

G_M27581_IG06:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8260010000         jb       G_M27581_IG40

G_M27581_IG07:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825F010000         jb       G_M27581_IG41

G_M27581_IG08:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825E010000         jb       G_M27581_IG42

G_M27581_IG09:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825D010000         jb       G_M27581_IG43

G_M27581_IG10:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825C010000         jb       G_M27581_IG44

G_M27581_IG11:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825B010000         jb       G_M27581_IG45

G_M27581_IG12:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F825A010000         jb       G_M27581_IG46

G_M27581_IG13:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8259010000         jb       G_M27581_IG47

G_M27581_IG14:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8258010000         jb       G_M27581_IG48

G_M27581_IG15:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8257010000         jb       G_M27581_IG49

G_M27581_IG16:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8256010000         jb       G_M27581_IG50

G_M27581_IG17:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8255010000         jb       G_M27581_IG51

G_M27581_IG18:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8254010000         jb       G_M27581_IG52

G_M27581_IG19:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8253010000         jb       G_M27581_IG53

G_M27581_IG20:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8252010000         jb       G_M27581_IG54

G_M27581_IG21:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8251010000         jb       G_M27581_IG55

G_M27581_IG22:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8250010000         jb       G_M27581_IG56

G_M27581_IG23:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824F010000         jb       G_M27581_IG57

G_M27581_IG24:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824E010000         jb       G_M27581_IG58

G_M27581_IG25:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824D010000         jb       G_M27581_IG59

G_M27581_IG26:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824C010000         jb       G_M27581_IG60

G_M27581_IG27:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824B010000         jb       G_M27581_IG61

G_M27581_IG28:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F824A010000         jb       G_M27581_IG62

G_M27581_IG29:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8249010000         jb       G_M27581_IG63

G_M27581_IG30:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8248010000         jb       G_M27581_IG64

G_M27581_IG31:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8247010000         jb       G_M27581_IG65

G_M27581_IG32:
       FFC8                 dec      eax
       83F801               cmp      eax, 1
       0F8246010000         jb       G_M27581_IG66

G_M27581_IG33:
       FFC8                 dec      eax
       85C0                 test     eax, eax
       0F8646010000         jbe      G_M27581_IG67

G_M27581_IG34:
       488D4108             lea      rax, bword ptr [rcx+8]
       4883C026             add      rax, 38
       0FB600               movzx    rax, byte  ptr [rax]

G_M27581_IG35:
       4881C4E8030000       add      rsp, 0x3E8
       C3                   ret      

G_M27581_IG36:
       33C9                 xor      ecx, ecx
       E871F0FFFF           call     ThrowHelper:ThrowArgumentNullException(int)

G_M27581_IG37:
       B902000000           mov      ecx, 2
       E8B7F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG38:
       B902000000           mov      ecx, 2
       E8ADF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG39:
       B902000000           mov      ecx, 2
       E8A3F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG40:
       B902000000           mov      ecx, 2
       E899F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG41:
       B902000000           mov      ecx, 2
       E88FF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG42:
       B902000000           mov      ecx, 2
       E885F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG43:
       B902000000           mov      ecx, 2
       E87BF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG44:
       B902000000           mov      ecx, 2
       E871F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG45:
       B902000000           mov      ecx, 2
       E867F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG46:
       B902000000           mov      ecx, 2
       E85DF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG47:
       B902000000           mov      ecx, 2
       E853F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG48:
       B902000000           mov      ecx, 2
       E849F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG49:
       B902000000           mov      ecx, 2
       E83FF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG50:
       B902000000           mov      ecx, 2
       E835F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG51:
       B902000000           mov      ecx, 2
       E82BF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG52:
       B902000000           mov      ecx, 2
       E821F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG53:
       B902000000           mov      ecx, 2
       E817F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG54:
       B902000000           mov      ecx, 2
       E80DF0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG55:
       B902000000           mov      ecx, 2
       E803F0FFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG56:
       B902000000           mov      ecx, 2
       E8F9EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG57:
       B902000000           mov      ecx, 2
       E8EFEFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG58:
       B902000000           mov      ecx, 2
       E8E5EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG59:
       B902000000           mov      ecx, 2
       E8DBEFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG60:
       B902000000           mov      ecx, 2
       E8D1EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG61:
       B902000000           mov      ecx, 2
       E8C7EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG62:
       B902000000           mov      ecx, 2
       E8BDEFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG63:
       B902000000           mov      ecx, 2
       E8B3EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG64:
       B902000000           mov      ecx, 2
       E8A9EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG65:
       B902000000           mov      ecx, 2
       E89FEFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG66:
       B902000000           mov      ecx, 2
       E895EFFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)

G_M27581_IG67:
       E880EFFFFF           call     ThrowHelper:ThrowIndexOutOfRangeException()
       CC                   int3     

; Total bytes of code 689, prolog size 7 for method Program:OneSpan(ref):ubyte

category:cq
theme:register-allocator
skill-level:expert
cost:medium

@gkhanna79
Copy link
Member

CC @RussKeldorph

@RussKeldorph
Copy link
Contributor

@dotnet/jit-contrib

@AndyAyersMS
Copy link
Member

Considering for 2.1.

@AndyAyersMS
Copy link
Member

This doesn't seem to repro anymore (from CHK jit in REL build):

; Assembly listing for method T:OneSpan(ref):ubyte
; Lcl frame size = 40

G_M24727_IG01:
       4883EC28             sub      rsp, 40

G_M24727_IG02:
       4885C9               test     rcx, rcx
       0F84BC010000         je       G_M24727_IG35

Frame size of 40 is what one would expect from a non-leaf with no locals (8 bytes for align, 32 for out arg area).

Not sure what changed things, maybe struct promotion kicking in now where it didn't before?

@benaadams
Copy link
Member Author

2 field Span vs 3 field Span? Was before the runtime has the two field Span so was using the three field Span. Will see if can repo

@benaadams
Copy link
Member Author

Still does it using a three field "SpanLike" StackSpace.cs though interestingly dotnet/coreclr#9068 and dotnet/coreclr#9066 have switched output StackSpace.asm

@benaadams
Copy link
Member Author

benaadams commented Oct 31, 2017

This sample now does a bunch of redundant movs

byte OneSpantype(byte[] array) (in StackSpace.cs/StackSpace.asm)

; Lcl frame size = 40

G_M5322_IG01:
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40

G_M5322_IG02:
       4885C9               test     rcx, rcx
       0F84AD020000         je       G_M5322_IG36

G_M5322_IG03:
       8B7108               mov      esi, dword ptr [rcx+8]
       488BF9               mov      rdi, rcx
       48B9E852780EF97F0000 mov      rcx, 0x7FF90E7852E8
       33D2                 xor      edx, edx
       E88AFAAD5F           call     CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
       488B05637AEEFF       mov      rax, qword ptr [reloc classVar[0xe788360]]
       83FE01               cmp      esi, 1
       0F828D020000         jb       G_M5322_IG37

G_M5322_IG04:
       48FFC0               inc      rax
       8D4EFF               lea      ecx, [rsi-1]
       8BF1                 mov      esi, ecx
       8BCE                 mov      ecx, esi
       8BF1                 mov      esi, ecx
       83FE01               cmp      esi, 1
       0F8282020000         jb       G_M5322_IG38

G_M5322_IG05:
       48FFC0               inc      rax
       8D4EFF               lea      ecx, [rsi-1]
       8BF1                 mov      esi, ecx
       8BCE                 mov      ecx, esi
       8BF1                 mov      esi, ecx
       83FE01               cmp      esi, 1
       0F8277020000         jb       G_M5322_IG39

@AndyAyersMS
Copy link
Member

The register shuffling could be something similar to dotnet/coreclr#11390.

@AndyAyersMS
Copy link
Member

Just to make current status clear...

First example above is now "good":

; Lcl frame size = 40

G_M24727_IG01:
       4883EC28             sub      rsp, 40

G_M24727_IG02:
       4885C9               test     rcx, rcx
       0F84BC010000         je       G_M24727_IG35

G_M24727_IG03:
       488D4110             lea      rax, bword ptr [rcx+16]
       8B4908               mov      ecx, dword ptr [rcx+8]
       83F901               cmp      ecx, 1
       0F82B6010000         jb       G_M24727_IG36
...

In the second example stack space is also contained, but register shuffling is still happening, even after dotnet/coreclr#16028. Will investigate.

@AndyAyersMS
Copy link
Member

The root issues are in the complexities of LSRA preferencing -- does not look like there is an easy surgical fix and any larger change here will be quite disruptive. So moving out of 2.1.

@AndyAyersMS
Copy link
Member

Here's a kind of similar example:

    public static ReadOnlySpan<char> Y(string s) => s.AsReadOnlySpan();

generates a chain of span copies, which the jit can't coalesce:

G_M47024_IG01:

G_M47024_IG02:
       4885D2               test     rdx, rdx
       7507                 jne      SHORT G_M47024_IG03
       33C0                 xor      rax, rax
       4533C0               xor      r8d, r8d
       EB1A                 jmp      SHORT G_M47024_IG04

G_M47024_IG03:
       488D420C             lea      rax, bword ptr [rdx+12]
       4C8BC0               mov      r8, rax
       8B5208               mov      edx, dword ptr [rdx+8]
       8BC2                 mov      eax, edx
       498BD0               mov      rdx, r8
       4C8BC2               mov      r8, rdx
       8BD0                 mov      edx, eax
       498BC0               mov      rax, r8
       448BC2               mov      r8d, edx

G_M47024_IG04:
       488901               mov      bword ptr [rcx], rax
       44894108             mov      dword ptr [rcx+8], r8d
       488BC1               mov      rax, rcx

G_M47024_IG05:
       C3                   ret

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@CarolEidt
Copy link
Contributor

The register shuffling may have been fixed e.g. by dotnet/coreclr/pull/19429

@CarolEidt CarolEidt modified the milestones: Future, 6.0.0 Nov 14, 2020
@benaadams
Copy link
Member Author

Confirmed the stack trace size and register shuffling for summary code is fixed: 7279.asm

@ghost ghost locked as resolved and limited conversation to collaborators Dec 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants