-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet fails to exit with thread stuck in coreclr!WatsonLastChance #66715
Comments
this is because after opening a handle to the process (with PROCESS_ALL_ACCESS) the call to GetExitCodeProcess returns not STILL_ACTIVE but 0.
Yet it still shows as ~15% CPU in task manager. I opened process explorer and it similarly showed the CPU varying around 15%, but when I tried to view threads in process explorer, the process finally terminated for real. Any idea what happened here? When I got to the machine originally, the process had been in that state for hours so it wasn't that I just got unlucky and tried to take a dump in some kind of brief window during its termination process. |
I'm not sure what is going on here. This code is just trying to get the handle from a Process object to call the native dbghelp!MiniDumpWriteDump. @dotnet/dotnet-diag does anybody know what might be happening here? |
If I encounter this again I'll try procdump or similar to get an "external" dump and perhaps get a clue. Perhaps Windows folks can tell us whether it's expected that a process can be in this state. |
aha, hit it again and this time got a dump from task manager. it's got one thread left, and it's stuck here
This Thoughts? Should I move this to the runtime repo? I have the dump if you need it. BTW, after taking this dump, I tried running |
it seems this thread is waiting on
which apparently got orphaned. I can't get more info from !locks:
|
Thank you @danmoseley!
Yes this looks like a runtime issue and I'll move this over.
Yes, please do share the dump if possible. It might be possible to create a repro based off of your analysis but the dump might be all that is needed and lead to a possible fix for this issue faster. |
Tagging subscribers to this area: @tommcdon Issue DetailsI built dotnet/runtime main in release, then ran outerloop tests. It jammed with active CPU, so I tried to get a dump. Running as admin (although it shouldn't matter - target process is not elevated)
cc @mikem8361
|
@tommcdon https://microsoft-my.sharepoint.com/:u:/p/danmose/EdfrQ9SlIeRCp-JmGXJ1om4BQNQX83pxpFa-FCHM33lt8w?e=8Zgywx is the dump. I can probably repro this if needed, since it's happened twice now. LMK. |
OK, it's reproing consistently for me with the OleDB tests (for some reason). For some reason, !pe shows a window title with Office in it.
WindowTitle: 'Microsoft.Office.dotnet.exe.15' |
Looking at procmon, it seems oledb32.dll is instantiating some office components, which is what pulls in the office dll's and creates that window with the title above. That explains why it's the oledb tests every time. Of course, it shouldn't hang, whatever the reason. No idea why nobody else is hitting this. Maybe I was pushed some Office experiment. |
This looks like a known issue that was intentionally preserved for back-compat: runtime/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs Lines 265 to 271 in 39fb7f7
Presumably we have a few options to fix it:
|
Hmm, that might not be the reason that GetProcessHandle calls GetExitCodeProcess. It might well be though, and reasoning reliably about the Process class can be difficult. I see no reason to change it -- it looks like a simple change to dotnet-dump to just get the handle from ::OpenProcess directly. I opened dotnet/diagnostics#2950 for that. The bigger issue is why is the process is hung with the stack above in the first place. |
Ah I misunderstood and thought you were trying to track down the dotnet-dump issue. I see now. I think the reason the process is hanging is from this claim here: https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/ee/debugger.cpp#L380 This thread appears to be neither the finalizer nor the debugger helper thread, but it does need to keep executing in order for the process to exit. Probably a reasonable fix will be for PreJitAttach to return immediately without doing any work if we detect that m_fShutdownMode is true. |
Happy to buddy test a private dll if it helps. |
@noahfalk do you think you will have cycles to put up the fix? This prevents libraries tests runs from completing on my machine, and one would assume on others, but I don't know. |
I'm writing it up now. Its possible this fixes it entirely or it may just be one layer of the onion. It should be safe to check in either way and then we can find out what impact it has for your local repro. |
Fixes dotnet#66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that problem by disabling the jit attach setup work and allowing WatsonLastChance to continue.
Fixes #66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that problem by disabling the jit attach setup work and allowing WatsonLastChance to continue.
confirmed all sorted now thanks @noahfalk |
awesome, glad it worked out : ) |
Fixes dotnet#66715 We are seeing exceptions thrown at shutdown turn into hangs because the debugger lock suspends threads at that point. We are mitigating that problem by disabling the jit attach setup work and allowing WatsonLastChance to continue.
I built dotnet/runtime main in release, then ran outerloop tests. It jammed with active CPU, so I tried to get a dump.
Running as admin (although it shouldn't matter - target process is not elevated)
cc @mikem8361
The text was updated successfully, but these errors were encountered: