Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Investigate why more stack space is allocated than needed and why they are not aligned #37429

Closed
kunalspathak opened this issue Jun 4, 2020 · 4 comments · Fixed by #37649
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@kunalspathak
Copy link
Member

kunalspathak commented Jun 4, 2020

This was brought to the attention by @TamarChristinaArm in #37139 (comment). For a method that takes 2 Vector64<> parameters, there are couple of problems:

  1. We allocate stack space of 48 bytes although we just want to store 16 bytes.
  2. The 16 bytes that is stored are not aligned properly. If they are aligned properly, we can convert it tp stp instead.

Here is the sample code:

G_M35607_IG01:
        A9BD7BFD          stp     fp, lr, [sp,#-48]!
        910003FD          mov     fp, sp
        FD0017A0          str     d0, [fp,#40]
        FD000FA1          str     d1, [fp,#24]
						;; bbWeight=1    PerfScore 3.50
G_M35607_IG02:
        FD4017B0          ldr     d16, [fp,#40]
        FD400FB1          ldr     d17, [fp,#24]
        0EB01E10          mov     v16.8b, v16.8b

Here is the assembly code for reference: Create_after.txt

@kunalspathak kunalspathak added the tenet-performance Performance related issue label Jun 4, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 4, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@kunalspathak
Copy link
Member Author

@BruceForstall , @CarolEidt

@CarolEidt
Copy link
Contributor

As I mentioned here this appears to be because Compiler::getSIMDTypeAlignment which is called by Compiler::lvaAllocLocalAndSetVirtualOffset always returns 16 for TARGET_ARM64 which it should not be doing for 8-byte vectors. That accounts for 32 of the 48 bytes. The remaining bytes are because we store fp and we're aligning to 16 bytes.

@kunalspathak
Copy link
Member Author

always returns 16 for TARGET_ARM64 which it should not be doing for 8-byte vectors.

Thanks @CarolEidt , I will try out this suggestion and verify that the change doesn't regress anywhere else.

@mangod9 mangod9 added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 5, 2020
@BruceForstall BruceForstall added this to the Future milestone Jun 8, 2020
@BruceForstall BruceForstall added arch-arm64 and removed untriaged New issue has not been triaged by the area owner labels Jun 8, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants