-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debugger: fix stuck debugger when debuggee exits #6332
Conversation
The debuggee node process will never exit because the execution is blocked inside node::debugger::Agent::Stop, joining the agent thread. The debug agent thread is blocked inside uv_run. To fix this, the debug agent has to close all client connections in _debug_agent.js (process._debugAPI.onclose). Additionally all remaining opened handles have to be closed before calling uv_loop_close. Otherwise uv_loop_close will fail with UV_EBUSY.
@@ -81,6 +85,12 @@ Agent.prototype.notifyWait = function notifyWait() { | |||
this.first = false; | |||
}; | |||
|
|||
Agent.prototype.destroyAllClients = function destroyAllClients() { | |||
this.clients.forEach(function(client) { | |||
client.destroy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ends up calling this.socket.destroy
I didn't dig too much further, but is this operation synchronous? If not there might be some unexpected behavior here
/cc @bnoordhuis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will dig into this. I used destroy, because it was already used in the close event.
Maybe this.socket.end()
would be sufficient here.
Edit: It seems that this.socket.end()
also works. But it eventually will also call destroy.
@quaidn amazing!!! this is a problem I've wanted to chase down for a bit but never quite got around to it. How comfortable would you feel making a test for this? Here is a test I wrote for a regression in util that might be useful as a starting point If we can get a test that doesn't work on master, but does work with this change we should be able to get this landed! |
@thealphanerd I agree. A test would be good here. I try to use your script as a starting point. |
@quaidn amazing. Please feel free to let me know if there is anything I can do to help |
Added a regression test for stuck debugger when the debuggee exits. The test does several cycles of 'continue' and 'run'. The test passes if the debuggee was terminated within the timeout.
@thealphanerd I added a regression test. Let me know what you think. On master without these changes, the timeout is hit. |
@@ -149,6 +154,9 @@ void Agent::Stop() { | |||
CHECK_EQ(err, 0); | |||
|
|||
uv_close(reinterpret_cast<uv_handle_t*>(&child_signal_), nullptr); | |||
// Close all remaining handles: | |||
// uv_loop_close will return UV_EBUSY for handles which are not closed. | |||
uv_walk(&child_loop_, close_handle, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround. I can't explain why there are always two pipe handles left referenced.
[--I] signal 0x30606e8
[-AI] async 0x3060530
[---] async 0x30603e8
[R--] pipe 0x7fd8a0058520
[R--] pipe 0x7fd8a0064260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the stdout and stderr pipes. Closing them like that is not very optimal, it leaves their PipeWrap instances in an invalid state.
Proper cleanup is one of the things I have to tackle for the multi-isolate work. I'll try to get around to it later this week.
7da4fd4
to
c7066fb
Compare
@bnoordhuis is this something you have time to take a look at? |
@Trott any thoughts on this? |
fixture | ||
]; | ||
|
||
const TEST_TIMEOUT_MS = 4000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe use common.platformTimeout(4000)
so Raspberry Pi devices get a little extra time in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Pinging @indutny on the C++ changes. |
+1 |
Use common.platformTimout to give ARM devices extra time for test execution.
/cc @nodejs/diagnostics |
ping @nodejs/diagnostics @bnoordhuis |
c133999
to
83c7a88
Compare
ping @nodejs/diagnostics @bnoordhuis might make sense to fix this in LTS |
Does this need to stay open? |
Marking this stalled. Will close soon if there is no further activity |
The multi-isolate work didn't go anywhere but the issue remains that blindly closing all libuv handles is a bad idea (as in 'segfault bad' and 'silent data corruption bad'.) My suggestion would be to do nothing and sit it out. The old debugger is going away and the inspector doesn't have this issue. |
The old debugger has been removed. I'll close this out. Thanks for the PR though. |
Checklist
Affected core subsystem(s)
debugger
Description of change
Referencing PR #27778
The debuggee node process will never exit because the execution
is blocked inside node::debugger::Agent::Stop,
joining the agent thread.
The debug agent thread is blocked inside uv_run.
To fix this, the debug agent has to close all client connections
in _debug_agent.js (process._debugAPI.onclose).
Additionally all remaining opened handles have to be closed
before calling uv_loop_close.
Otherwise uv_loop_close will fail with UV_EBUSY.
Assume the following script (debug.js):
Calling node debug debug.js produces the following output:
Both
and
exit immediately.