Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable hot/cold splitting of EH funclets #71236

Merged
merged 3 commits into from
Jun 29, 2022

Conversation

amanasifkhalid
Copy link
Member

This PR enables hot/cold splitting for functions containing exception handling. Currently, EH funclets are placed after the function's main body in memory. Via the heuristic that EH funclets are infrequently run, this enables a simple splitting implementation:

  • If we find a good split point in the function's main body, split there as usual.
  • Else if the function has exception handling,
    • Check that all the EH funclets are cold. If any of the funclets are frequently run, don't bother splitting at all.
    • If all of the EH funclets are cold, split at fgFirstFuncletBB, so that all EH funclets are placed in the cold section.

Being able to split each funclet independently would likely be more performant, but this simpler approach is beneficial for various reasons:

  • We can quickly enable hot/cold splitting for functions with EH -- a nontrivial case.
  • By either moving all EH funclets to the cold section or not splitting at all, we don't need to re-architect unwind info in the JIT or runtime.
  • We don't need to change the layout of EH funclets.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 23, 2022
@ghost ghost assigned amanasifkhalid Jun 23, 2022
@ghost
Copy link

ghost commented Jun 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR enables hot/cold splitting for functions containing exception handling. Currently, EH funclets are placed after the function's main body in memory. Via the heuristic that EH funclets are infrequently run, this enables a simple splitting implementation:

  • If we find a good split point in the function's main body, split there as usual.
  • Else if the function has exception handling,
    • Check that all the EH funclets are cold. If any of the funclets are frequently run, don't bother splitting at all.
    • If all of the EH funclets are cold, split at fgFirstFuncletBB, so that all EH funclets are placed in the cold section.

Being able to split each funclet independently would likely be more performant, but this simpler approach is beneficial for various reasons:

  • We can quickly enable hot/cold splitting for functions with EH -- a nontrivial case.
  • By either moving all EH funclets to the cold section or not splitting at all, we don't need to re-architect unwind info in the JIT or runtime.
  • We don't need to change the layout of EH funclets.
Author: amanasifkhalid
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -8473,7 +8470,7 @@ void emitter::emitIns_J(instruction ins, BasicBlock* dst, int instrCount)

id->idIns(ins);
id->idInsFmt(fmt);
id->idjShort = idjShort;
id->idjShort = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When branching to a finally funclet on ARM64, we emit an INS_bl_local instruction -- previously, we could guarantee this jump would stay within the hot section, and thus would set idjShort = true. Now that EH funclets can be moved to the cold section, we can no longer assume this, and must set idjKeepLong based on if we're crossing hot/cold sections. Without changing this logic, we'll hit asserts in emitOutputLJ due to the branch being too short.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review June 23, 2022 23:52
@amanasifkhalid
Copy link
Member Author

@AndyAyersMS Since Bruce is OOF, could you PTAL? Thank you!

{
for (BasicBlock* block = fgFirstFuncletBB; block != nullptr; block = block->bbNext)
{
#if HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we always define HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION to be 1.

Does that mean we're actually never splitting off EH any regions, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't notice HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION is always 1. During testing, I brute-forced splitting at fgFirstFuncletBB via JitStressProcedureSplitting, so I didn't notice this condition was blocking splitting. I've removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be that simple. Presumably HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION exists for some reason. It would be good to know what that reason is.

Looking at the git blame for jit.h this define has been like this for years now. Perhaps it is no longer applicable, or perhaps it reflects some limitation in the way the jit must report EH information back to the runtime.

@jkotas do you know offhand why the jit has this constraint? If not, I can do some digging.

If it is indeed OK to change this, I'd recommend changing the define instead, so that the rest of the jit is freed from this constraint as well, and do that as a prerequisite, stand-alone change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know the reason behind this. The EE side is using GetCodeAddressForRelOffset method to resolve the handler offset to IP. This method deals with handler entrypoints in the cold section correctly.

* section is cold. If any of the funclets are hot, then it may not be
* beneficial to split at fgFirstFuncletBB, moving all funclets to the
* cold section.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the "new style" function comments? They look like

//------------------------------------------------------------------------------
// methodName: short description
//
// Notes:
//   (text you have above)
//
bool Compiler::fgFuncletsAreCold()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing; fixed.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Can you show an asm diff example of a small test case before/after with EH and stress splitting enabled?

@amanasifkhalid
Copy link
Member Author

Rebasing on top of my change disabling HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

Can you show an asm diff example of a small test case before/after with EH and stress splitting enabled?

Sure thing. Below is a JIT disasm (on Win x64) for a short function that catches a System.AccessViolationException without EH splitting:


G_M45968_IG01:              ;; offset=0000H
       55                   push     rbp
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488D6C2430           lea      rbp, [rsp+30H]
       488965F0             mov      qword ptr [rbp-10H], rsp
						;; size=15 bbWeight=1    PerfScore 3.75
G_M45968_IG02:              ;; offset=000FH
       4863C2               movsxd   rax, edx
       4803C1               add      rax, rcx
       A801                 test     al, 1
       7506                 jne      SHORT G_M45968_IG04
						;; size=10 bbWeight=1    PerfScore 1.75
G_M45968_IG03:              ;; offset=0019H
       480FBF00             movsx    rax, word  ptr [rax]
       EB04                 jmp      SHORT G_M45968_IG05
						;; size=6 bbWeight=0.50 PerfScore 3.00
G_M45968_IG04:              ;; offset=001FH
       480FBF00             movsx    rax, word  ptr [rax]
						;; size=4 bbWeight=0.50 PerfScore 2.00
G_M45968_IG05:              ;; offset=0023H
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5D                   pop      rbp
       C3                   ret      
						;; size=7 bbWeight=1    PerfScore 2.25
G_M45968_IG06:              ;; offset=002AH
       55                   push     rbp
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488B6920             mov      rbp, qword ptr [rcx+32]
       48896C2420           mov      qword ptr [rsp+20H], rbp
       488D6D30             lea      rbp, [rbp+30H]
						;; size=19 bbWeight=0    PerfScore 0.00
G_M45968_IG07:              ;; offset=003DH
       48B9589C5305FC7F0000 mov      rcx, 0x7FFC05539C58      ; System.AccessViolationException
       E800000000           call     CORINFO_HELP_NEWSFAST
       488BF0               mov      rsi, rax
       488BCE               mov      rcx, rsi
       FF1500000000         call     [hackishModuleName:hackishMethodName():this]
       488BCE               mov      rcx, rsi
       E800000000           call     CORINFO_HELP_THROW
       CC                   int3     
						;; size=36 bbWeight=0    PerfScore 0.00

Note the lack of hot/cold splitting. Now, with EH splitting:

G_M45968_IG01:              ;; offset=0000H
       55                   push     rbp
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488D6C2430           lea      rbp, [rsp+30H]
       488965F0             mov      qword ptr [rbp-10H], rsp
						;; size=15 bbWeight=1    PerfScore 3.75
G_M45968_IG02:              ;; offset=000FH
       4863C2               movsxd   rax, edx
       4803C1               add      rax, rcx
       A801                 test     al, 1
       7506                 jne      SHORT G_M45968_IG04
						;; size=10 bbWeight=1    PerfScore 1.75
G_M45968_IG03:              ;; offset=0019H
       480FBF00             movsx    rax, word  ptr [rax]
       EB04                 jmp      SHORT G_M45968_IG05
						;; size=6 bbWeight=0.50 PerfScore 3.00
G_M45968_IG04:              ;; offset=001FH
       480FBF00             movsx    rax, word  ptr [rax]
						;; size=4 bbWeight=0.50 PerfScore 2.00
G_M45968_IG05:              ;; offset=0023H
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5D                   pop      rbp
       C3                   ret      
						;; size=7 bbWeight=1    PerfScore 2.25
************** Beginning of cold code **************

G_M45968_IG06:              ;; offset=002AH
       55                   push     rbp
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488B6920             mov      rbp, qword ptr [rcx+32]
       48896C2420           mov      qword ptr [rsp+20H], rbp
       488D6D30             lea      rbp, [rbp+30H]
						;; size=19 bbWeight=0    PerfScore 0.00
G_M45968_IG07:              ;; offset=003DH
       48B9589C5305FC7F0000 mov      rcx, 0x7FFC05539C58      ; System.AccessViolationException
       E800000000           call     CORINFO_HELP_NEWSFAST
       488BF0               mov      rsi, rax
       488BCE               mov      rcx, rsi
       FF1500000000         call     [hackishModuleName:hackishMethodName():this]
       488BCE               mov      rcx, rsi
       E800000000           call     CORINFO_HELP_THROW
       CC                   int3     
						;; size=36 bbWeight=0    PerfScore 0.00

@amanasifkhalid
Copy link
Member Author

@BruceForstall Would you also like to see an asmdiff with/without stress splitting? That implementation still splits after the first basic block, so I'm not sure how well it will demonstrate EH funclet splitting.

@BruceForstall
Copy link
Member

@BruceForstall Would you also like to see an asmdiff with/without stress splitting? That implementation still splits after the first basic block, so I'm not sure how well it will demonstrate EH funclet splitting.

I guess I was wondering if there is a case where you can see visible asm diff or unwind/EH info diffs, but I guess with funclets there won't be any hot/cold branches that change, because there are no direct branches to/from the funclets, except for BBJ_CALLFINALLY to a finally block. For a catch block that doesn't end in a throw (like the example above), it does load the address of the "continuation" (where the catch should return to), so that needs to be properly relocated.

@amanasifkhalid
Copy link
Member Author

The relocation isn't present in the above example because the BBJ_EHCATCHRET jump was optimized out; in other test cases, the relocation was observed. The control flow for reporting relocations looks correct -- this correctness will be verified when we begin testing EH funclet splitting with the Crossgen2 prototype.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants