Skip to content

Conversation

eterekhin
Copy link
Contributor

@eterekhin eterekhin commented Jul 24, 2025

Hey, folks!

This PR fixes a crash because of eval abort, when abort called before the method actually runs (no managed frames related to the eval are on the stack at this moment, please see the stack trace attached below).

[0x0]   coreclr!SfiInit+0x1f1   0x5f5ab98c00   0x7ffa8804a1de   
[0x1]   System_Private_CoreLib!System.Runtime.ExceptionServices.InternalCalls.RhpSfiInit(System.Runtime.StackFrameIterator ByRef, Void*, Boolean, Boolean*)+0x7e   0x5f5ab98ff0   0x7ffa88012a46   
[0x2]   System_Private_CoreLib!System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)+0xc6   0x5f5ab990e0   0x7ffa880126c9   
[0x3]   System_Private_CoreLib!System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef)+0x49   0x5f5ab99220   0x7ffa89e5d043   
[0x4]   coreclr!CallDescrWorkerInternal+0x83   0x5f5ab99250   0x7ffa89952820   
[0x5]   coreclr!CallDescrWorkerWithHandler+0x130   0x5f5ab99290   0x7ffa899537bc   
[0x6]   coreclr!DispatchCallSimple+0x26c   0x5f5ab992f0   0x7ffa89d4e638   
[0x7]   coreclr!DispatchManagedException+0x388   0x5f5ab99480   0x7ffa89d4e247   
[0x8]   coreclr!DispatchManagedException+0x67   0x5f5ab9aa80   0x7ffa89c0e72f   
[0x9]   coreclr!Thread::HandleThreadAbort+0x1df   0x5f5ab9b000   0x7ffa8995280e   
[0xa]   coreclr!CallDescrWorkerWithHandler+0x11e   0x5f5ab9b190   0x7ffa899533bb   
[0xb]   coreclr!MethodDescCallSite::CallTargetWorker+0xb8b   0x5f5ab9b1f0   0x7ffa893cfea2   
[0xc]   coreclr!MethodDescCallSite::CallWithValueTypes_RetArgSlot+0x32   0x5f5ab9b9e0   0x7ffa893d7d2f   
[0xd]   coreclr!`FuncEvalWrapper'::`3'::__Body::Run+0x7f   0x5f5ab9ba10   0x7ffa893d24f7   
[0xe]   coreclr!FuncEvalWrapper+0x97   0x5f5ab9ba70   0x7ffa893d17df   
[0xf]   coreclr!DoNormalFuncEval+0x9af   0x5f5ab9bb30   0x7ffa893d3357   
[0x10]   coreclr!GCProtectArgsAndDoNormalFuncEval+0x657   0x5f5ab9c2d0   0x7ffa893d1acd   <-- func eval abort exception handler here
[0x11]   coreclr!FuncEvalHijackRealWorker+0x8d   0x5f5ab9ca60   0x7ffa893d996a   
[0x12]   coreclr!FuncEvalHijackWorker+0x50a   0x5f5ab9cef0   0x7ffa893eb4bd   
[0x13]   coreclr!FuncEvalHijack+0xd   0x5f5ab9d250   0xcccccccc   

Debugger calls ICorDebugEval::CallFunction, ICorDebugProcess::Continue and ICorDebugEval::Abort sequentially.
When a being evaluated method already runs at the moment when Abort is performed, ThreadAbortException is propagated to the exception handler added in GCProtectArgsAndDoNormalFuncEval, but when it doesn't, we unwound the exception to a frame where debugger was stopped, missing the exception handler, that leads to a process crash during first exception pass here

It reproduces in .NET 9 installation on all OS and in main, .NET 8 works fine for me

This problem araises when runtime executes the class constructor before the call, so it takes some time to get to the call itself

In this PR I check for this situation in SfiInit and set pfIsExceptionIntercepted flag, by that we skip SfiNext calls and go straight to CallCatchFunclet. I also added check for DebuggerU2MCatchHandlerFrame because I guess we may hit this issue for it as well

@janvorli, may I ask you to review it, please? I have probably missed some pieces :) Also not sure the tests will be green

…when they are higher than the top managed frame in stack
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 24, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 24, 2025
@teo-tsirpanis teo-tsirpanis added area-ExceptionHandling-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 24, 2025
@janvorli
Copy link
Member

@eterekhin thank you for looking into this issue and creating a fix! I think the fix should be made differently though. For example, the pfIsExceptionIntercepted has a special use for exception interception by the debugger and I think reusing it for a different purpose may cause troubles. I am currently working on a fix for processing unhandled exceptions and this problem is in the same bucket. I want to fix it in a unified manner.
Do you happen to have a repro project that I can use to test it?

@eterekhin
Copy link
Contributor Author

@janvorli, Thank you! That's great you are going to fix this case!
Unfortunately repro is very random, I will try to find a stable repro steps within a couple of days and let you know

@eterekhin
Copy link
Contributor Author

@janvorli, Hello! I've found a stable repro, Win x64, .NET 9

  1. Open NotCaughtThreadAbortReproProject\NotCaughtThreadAbortReproProject folder in VS code
  2. Set breakpoint in Program.cs 18 line
  3. Run debug and wait until debugger stops at the breakpoint
  4. Evaluate "SomeFunc()" in watches

After 6 seconds waiting I see (please see the screenshot)

before operation
after operation
Fatal error. Internal CLR error. (0x80131506)
   at System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)
   at System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef)
   at Repro.Program.Main(System.String[])
The target process exited with code -2146233082 (0x80131506) while evaluating the function 'Repro.Program.SomeFunc'.
image

NotCaughtThreadAbortReproProject.zip

@janvorli
Copy link
Member

janvorli commented Aug 4, 2025

@eterekhin thank you! I'll use it to verify my changes.

janvorli added a commit to janvorli/runtime that referenced this pull request Aug 4, 2025
There is a problem with threadabort in funceval in case there is no
managed frame on the stack between the abortion point and the
`FuncEvalFrame`. That can happen e.g. when invoking a static method via
funceval for a type with static constructor that was not invoked yet and
takes a long time to complete.

The problem is caused by the fact that when EH is called to propagate
the ThreadAbortException, it starts at the first managed frame and so it
skips the try/catch in the funceval native code.

This change fixes it by using `RaiseTheExceptionInternalOnly` to
raise the `ThreadAbortException` in the `Thread::HandleThreadAbort`. The
`Thread::HandleThreadAbort` is always called by native code that has a
native catch or (on Windows) ends up calling the `ProcessCLRException`.

I have originally made the change to call the `DispatchManagedException`
from the `Thread::HandleThreadAbort`, but this issue shows it is
problematic.

I have verified that the repro provided by @eterekhin in the issue
report no longer causes the process to crash with failfast, but
reports the funceval as timed out as expected.

Close dotnet#118015
janvorli added a commit that referenced this pull request Aug 4, 2025
There is a problem with threadabort in funceval in case there is no
managed frame on the stack between the abortion point and the
`FuncEvalFrame`. That can happen e.g. when invoking a static method via
funceval for a type with static constructor that was not invoked yet and
takes a long time to complete.

The problem is caused by the fact that when EH is called to propagate
the ThreadAbortException, it starts at the first managed frame and so it
skips the try/catch in the funceval native code.

This change fixes it by using `RaiseTheExceptionInternalOnly` to
raise the `ThreadAbortException` in the `Thread::HandleThreadAbort`. The
`Thread::HandleThreadAbort` is always called by native code that has a
native catch or (on Windows) ends up calling the `ProcessCLRException`.

I have originally made the change to call the `DispatchManagedException`
from the `Thread::HandleThreadAbort`, but this issue shows it is
problematic.

I have verified that the repro provided by @eterekhin in the issue
report no longer causes the process to crash with failfast, but
reports the funceval as timed out as expected.

Close #118015
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Aug 5, 2025
There is a problem with threadabort in funceval in case there is no
managed frame on the stack between the abortion point and the
`FuncEvalFrame`. That can happen e.g. when invoking a static method via
funceval for a type with static constructor that was not invoked yet and
takes a long time to complete.

The problem is caused by the fact that when EH is called to propagate
the ThreadAbortException, it starts at the first managed frame and so it
skips the try/catch in the funceval native code.

This change fixes it by using `RaiseTheExceptionInternalOnly` to
raise the `ThreadAbortException` in the `Thread::HandleThreadAbort`. The
`Thread::HandleThreadAbort` is always called by native code that has a
native catch or (on Windows) ends up calling the `ProcessCLRException`.

I have originally made the change to call the `DispatchManagedException`
from the `Thread::HandleThreadAbort`, but this issue shows it is
problematic.

I have verified that the repro provided by @eterekhin in the issue
report no longer causes the process to crash with failfast, but
reports the funceval as timed out as expected.

Close dotnet#118015
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-ExceptionHandling-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants