Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT\opt\OSR\tailrecursetry\tailrecursetry.cmd fails with COMPlus_FastTailCalls=0 #35687

Closed
Tracked by #63902
erozenfeld opened this issue May 1, 2020 · 14 comments · Fixed by #59784
Closed
Tracked by #63902

JIT\opt\OSR\tailrecursetry\tailrecursetry.cmd fails with COMPlus_FastTailCalls=0 #35687

erozenfeld opened this issue May 1, 2020 · 14 comments · Fixed by #59784
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Milestone

Comments

@erozenfeld
Copy link
Member

erozenfeld commented May 1, 2020

COMPlus_FastTailCalls is a new environment variable added in #341. When it's set to 0, all tail-prefixed calls are dispatched via helpers.

JIT\opt\OSR\tailrecursetry\tailrecursetry.cmd is a test that turns on on-stack replacement, it sets

set COMPlus_TieredCompilation=1
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TC_OnStackReplacement=1

If additionally we set

set COMPlus_FastTailCalls=0

the test fails with

BEGIN EXECUTION
 "f:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\corerun.exe" tailrecursetry.dll
starting sum

Assert failure(PID 1896 [0x00000768], Thread: 49832 [0xc2a8]): Assertion failed 'phiFound' in 'TailRecursionWithOsrEntryInTry:F(int,int,int,int):int' during 'SSA: insert phis' (IL size 49)

    File: F:\runtime\src\coreclr\src\jit\ssabuilder.cpp Line: 922
    Image: f:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\CoreRun.exe

Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED

category:correctness
theme:osr
skill-level:expert
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels May 1, 2020
@erozenfeld
Copy link
Member Author

@dotnet/jit-contrib @AndyAyersMS

@BruceForstall BruceForstall added this to the 5.0 milestone May 3, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label May 3, 2020
@AndyAyersMS
Copy link
Member

Pre-SSA liveness deletes the assignment to the catch variable, and things go downhill from there.

Removing statement STMT00009 (IL 0x014...0x015)
N003 (  5,  4) [000046] -A---O--R---              *  ASG       ref   
N002 (  3,  2) [000045] D------N----              +--*  LCL_VAR   ref    V08 tmp1         
N001 (  1,  1) [000011] -----O------              \--*  CATCH_ARG ref   
 in BB08 as useless:

BB08 becomes empty

@AndyAyersMS
Copy link
Member

Actually that part is OK, the issue is that upstream we're not getting the expected dominance frontier for the catch block.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented May 5, 2020

Looks like a similar issue to what I saw over with #34522 -- there are more places in the jit where we assume the try region entry is the first try.

In this test case we normally change the recursive tail call into a jump. In order to make that work with OSR we import the method entry block and mark it as must keep, since it will otherwise be unreferenced until morph does the call -> jump transformation. Importing this block may pull in other blocks that OSR normally wouldn't import.

We don't do that transformation when COMPlus_FastTailCalls=0, but we still have the additional block(s) we imported in anticipation that we might do the transformation. These blocks are unreachable but the ones marked as keep remain around. And in this case we end up with a try region where the first block of the try is unreachable and empty but is still there in the flow graph.

Just before building SSA we have we have a try region BB03-BB04 where BB03 is empty and unreachable and the try entry is BB04.

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0009]  1                             1        [???..???)-> BB04 (always)                     i internal label target 
BB02 [0000]  0                             0        [000..002)        (throw )                     keep i rare bwd bwd-target 
BB03 [0001]  0  0                          0        [002..006)        (throw ) T0      try {       keep i try rare label bwd 
BB04 [0002]  2  0    BB01,BB04             8        [006..012)-> BB04 ( cond ) T0      }           i Loop label target bwd bwd-target 
BB05 [0006]  2       BB04,BB08             1        [017..01B)-> BB07 ( cond )                     i label target bwd 
BB06 [0007]  1       BB05                  1        [01B..01D)        (return)                     i bwd 
BB07 [0008]  1       BB05                  1        [01D..031)        (return)                     i label target gcsafe bwd 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB08 [0005]  1     0                       0        [014..017)-> BB05 ( cret )    H0 F catch { }   keep i rare label target flet bwd 

This breaks an assumption in BlockPredsWithEH that the first block of the try is the try entry, and so it predecessors are potentially also predecessors to any handler block for the try. Because of this we fail to put a PHI into BB08 for V04, and hence the assert.

Fixing that in a manner similar to the fix in #34522 (search the try region for the entry) gets us a bit further as we insert the PHI, but we then hit another assert as the PHI is degenerate and has only one input.

Now the issue is that when building SSA we hit a similar assumption in AddPhiArgsToSuccessors. This time we're outside the try and need to decide if we should add the current defs to handler successors of the block, and we fail to realize that BB01 needs to add its def of V04 to the BB08 phi, because BB01's successor BB04 is in a try but is not the first block of the try.

So it looks like there are now at least 3 places where the jit assumes that the first block of the try is the try entry. Instead of trying to fix these (and possibly more) I'm going to see if we can prune away the unreachable blocks so that the OSR cases continue to follow the expected form for trys.

@AndyAyersMS
Copy link
Member

I'm going to see if we can prune away the unreachable blocks

Have part of this fixed -- the jit is now is able to get rid of BB02 in the above; instead of protecting it with a bbflag I can protect it by manipulating reference counts. This approach is needed because once you set BBF_DONT_REMOVE it is effectively impossible to unset.

Getting rid of BB03 is going to be a bit harder; there is a lot of bias in the code expecting that try region entry blocks are are untouchable and come first in the region. Likely what we'll need is to do something akin to the region entry trimming I added when OSR was first introduced, but this version of try region trimming must run later, and is only needed if we imported the original method entry, and then morph decides not to introduce branches to the original entry.

The logic should be something like: check to see if the entry block successor is in a try region, and if so, if that block is not the first block of the try, move it so it is the first block in the try. Need to ensure this is workable when the entry block successor is in multiple trys.

@AndyAyersMS
Copy link
Member

May also need to watch out for #6892.

@AndyAyersMS
Copy link
Member

Well, figuring out the right thing here is looking more complex than I'd like; we may need to support the idea that a try region entry is not always the first block in the try.

Since this case seemingly only happens if:

  • The OSR entry is point in a try; and
  • The method hast tail recursive call

I think I should to just disable the tail recursion to loop opt in the above case for now, so things are at least correct, and defer further work. We can still tail call.

@erozenfeld
Copy link
Member Author

I think I should to just disable the tail recursion to loop opt in the above case for now, so things are at least correct, and defer further work. We can still tail call.

I'm confused. The test fails with "set COMPlus_FastTailCalls=0", which also disables tail recursion to loop opt. What am I missing?

@AndyAyersMS
Copy link
Member

To support tail recursion->loop in OSR we over-import; if the optimization doesn't happen, we're left with unreachable code that is hard to properly clean up when it involves try regions. The unreachable code leads to confusion when enumerating EH preds and building SSA.

COMPlus_FastTailCalls=0 disables the optimization and so is an enabler here; normally we can do tail recursion -> loop with few restrictions so we don't hit this problem. There are likely examples where we could make OSR fail without this complus setting.

@erozenfeld
Copy link
Member Author

I think I understand now. Your proposal is to both disable tail recursion to loop opt and to stop over-importing if the two bullets above are true. That makes sense.

@AndyAyersMS
Copy link
Member

Yep.

@AndyAyersMS
Copy link
Member

Hmm, that's not sufficient either -- if we have multiple levels of nested loops and trys we still get into trouble as we have try regions whose first block is reachable but is not the try entry.

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0015]  1                             1        [???..???)-> BB04 (always)                     i internal label target 
BB02 [0002]  1  1    BB05                  4        [006..007)                 T1      try {       keep i try Loop label target bwd bwd-target 
BB03 [0003]  1  0    BB02                  4        [007..00B)-> BB05 ( cond ) T0      try {       keep i try label bwd 
BB04 [0004]  3  0    BB01,BB03,BB04       32        [00B..017)-> BB04 ( cond ) T0      }           i Loop label target bwd bwd-target 
BB05 [0008]  3  1    BB03,BB04,BB09        8        [01C..025)-> BB02 ( cond ) T1      }           i label target bwd 
BB06 [0012]  2       BB05,BB10             1        [02A..02E)-> BB08 ( cond )                     i label target 
BB07 [0013]  1       BB06                  1        [02E..030)        (return)                     i 
BB08 [0014]  1       BB06                  1        [030..042)        (return)                     i label target gcsafe 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB09 [0007]  1  1  0                       0        [019..01C)-> BB05 ( cret ) T1 H0 F catch { }   keep i rare label target flet bwd 
BB10 [0011]  1     1                       0        [027..02A)-> BB06 ( cret )    H1 F catch { }   keep i rare label target flet 
-----------------------------------------------------------------------------------------------------------------------------------------

Here BB02 is the try start but can only be reached from inside the try. The entry is BB04 as it's the only block in the try that can be reached from the outside.

Looking more and more like teaching the jit that try start and try entry may not be the same block is the only way to properly address all of this. That means yet another link in the eh table that needs careful treatment.

@AndyAyersMS
Copy link
Member

Found a related issue with trying to support the recursive tail call -> loop optimization with OSR. If the tail call site is introduced by an inlinee, the root method won't have done the necessary anticipatory importation.

Seems like there are two issues to sort through here:

  • Whether we can support tail call -> loop at all in OSR. Looking doubtful. Might be salvageable if we could run it earlier than morph and could restart importation in the root method during inlining, or similar. Or abandon OSR partial importation and try to prune away unreachable parts of the flow graph more effectively.
  • Whether excluding tail call -> loop fixes all of the SSA/Flow issues when we have patchpoints in try regions. Also looks doubtful. Seems like we need to be able to have patchpoints in trys so we really need a solution for this.

@AndyAyersMS
Copy link
Member

This is an OSR-only issue; since that's an experimental feature in 5.0 and the fix looks to be somewhat disruptive and would impact more than OSR, I'm going to mark this as future.

@AndyAyersMS AndyAyersMS modified the milestones: 5.0.0, Future Jul 1, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
AndyAyersMS added a commit that referenced this issue Sep 29, 2021
This change adds control flow to ensure that an OSR method for a patchpoint
nested in try regions enters those regions try from the first block of each
try rather than mid-try.

This lets these OSR methods conform to the data flow expectations that the
only way control flow can enter a try is via its first block.

See #33658 for more details on the approach taken here.

Fixes #35687.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 29, 2021
AndyAyersMS added a commit that referenced this issue Oct 6, 2021
This change adds control flow to ensure that an OSR method for a patchpoint
nested in try regions enters those regions try from the first block of each
try rather than mid-try.

This lets these OSR methods conform to the data flow expectations that the
only way control flow can enter a try is via its first block.

See #33658 for more details on the approach taken here.

Fixes #35687.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants