Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugger: fix stuck debugger when debuggee exits #6332

Closed
wants to merge 3 commits into from
Closed

debugger: fix stuck debugger when debuggee exits #6332

wants to merge 3 commits into from

Conversation

reshnm
Copy link
Contributor

@reshnm reshnm commented Apr 21, 2016

Checklist
  • tests and code linting passes
  • a test and/or benchmark is included
  • the commit message follows commit guidelines
Affected core subsystem(s)

debugger

Description of change

Referencing PR #27778

The debuggee node process will never exit because the execution
is blocked inside node::debugger::Agent::Stop,
joining the agent thread.
The debug agent thread is blocked inside uv_run.

To fix this, the debug agent has to close all client connections
in _debug_agent.js (process._debugAPI.onclose).

Additionally all remaining opened handles have to be closed
before calling uv_loop_close.
Otherwise uv_loop_close will fail with UV_EBUSY.

Assume the following script (debug.js):

console.log('debug');

Calling node debug debug.js produces the following output:

< Debugger listening on port 5858
connecting to 127.0.0.1:5858 ... ok
break in debug.js:1
> 1 console.log('debug');
  2 
  3 });
debug> c
< debug
program terminated
debug> 

Both

node --debug -e 0

and

node --debug-brk -e 0

exit immediately.

The debuggee node process will never exit because the execution
is blocked inside node::debugger::Agent::Stop,
joining the agent thread.
The debug agent thread is blocked inside uv_run.

To fix this, the debug agent has to close all client connections
in _debug_agent.js (process._debugAPI.onclose).

Additionally all remaining opened handles have to be closed
before calling uv_loop_close.
Otherwise uv_loop_close will fail with UV_EBUSY.
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Apr 21, 2016
@mscdex mscdex added debugger and removed c++ Issues and PRs that require attention from people who are familiar with C++. labels Apr 21, 2016
@@ -81,6 +85,12 @@ Agent.prototype.notifyWait = function notifyWait() {
this.first = false;
};

Agent.prototype.destroyAllClients = function destroyAllClients() {
this.clients.forEach(function(client) {
client.destroy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ends up calling this.socket.destroy

I didn't dig too much further, but is this operation synchronous? If not there might be some unexpected behavior here

/cc @bnoordhuis

Copy link
Contributor Author

@reshnm reshnm Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will dig into this. I used destroy, because it was already used in the close event.
Maybe this.socket.end() would be sufficient here.

Edit: It seems that this.socket.end() also works. But it eventually will also call destroy.

@MylesBorins
Copy link
Contributor

@quaidn amazing!!! this is a problem I've wanted to chase down for a bit but never quite got around to it. How comfortable would you feel making a test for this?

Here is a test I wrote for a regression in util that might be useful as a starting point
https://github.com/nodejs/node/blob/master/test/parallel/test-debugger-util-regression.js

If we can get a test that doesn't work on master, but does work with this change we should be able to get this landed!

@MylesBorins
Copy link
Contributor

@reshnm
Copy link
Contributor Author

reshnm commented Apr 21, 2016

@thealphanerd I agree. A test would be good here. I try to use your script as a starting point.

@MylesBorins MylesBorins added the confirmed-bug Issues with confirmed bugs. label Apr 21, 2016
@MylesBorins
Copy link
Contributor

@quaidn amazing. Please feel free to let me know if there is anything I can do to help

Added a regression test for stuck debugger
when the debuggee exits.

The test does several cycles of 'continue'
and 'run'. The test passes if the debuggee
was terminated within the timeout.
@reshnm
Copy link
Contributor Author

reshnm commented Apr 22, 2016

@thealphanerd I added a regression test. Let me know what you think.

On master without these changes, the timeout is hit.

@@ -149,6 +154,9 @@ void Agent::Stop() {
CHECK_EQ(err, 0);

uv_close(reinterpret_cast<uv_handle_t*>(&child_signal_), nullptr);
// Close all remaining handles:
// uv_loop_close will return UV_EBUSY for handles which are not closed.
uv_walk(&child_loop_, close_handle, nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround. I can't explain why there are always two pipe handles left referenced.

[--I] signal   0x30606e8
[-AI] async    0x3060530
[---] async    0x30603e8
[R--] pipe     0x7fd8a0058520
[R--] pipe     0x7fd8a0064260

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the stdout and stderr pipes. Closing them like that is not very optimal, it leaves their PipeWrap instances in an invalid state.

Proper cleanup is one of the things I have to tackle for the multi-isolate work. I'll try to get around to it later this week.

@MylesBorins MylesBorins self-assigned this May 9, 2016
@MylesBorins
Copy link
Contributor

@bnoordhuis is this something you have time to take a look at?

@MylesBorins
Copy link
Contributor

@Trott any thoughts on this?

fixture
];

const TEST_TIMEOUT_MS = 4000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe use common.platformTimeout(4000) so Raspberry Pi devices get a little extra time in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Trott
Copy link
Member

Trott commented Jun 7, 2016

Pinging @indutny on the C++ changes.

@jjqq2013
Copy link
Contributor

+1

Use common.platformTimout to give ARM devices extra time
for test execution.
@MylesBorins
Copy link
Contributor

/cc @nodejs/diagnostics

@MylesBorins
Copy link
Contributor

ping @nodejs/diagnostics @bnoordhuis

@MylesBorins
Copy link
Contributor

ping @nodejs/diagnostics @bnoordhuis

might make sense to fix this in LTS

@jasnell
Copy link
Member

jasnell commented Mar 1, 2017

Does this need to stay open?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 24, 2017
@jasnell
Copy link
Member

jasnell commented Mar 24, 2017

Marking this stalled. Will close soon if there is no further activity

@bnoordhuis
Copy link
Member

The multi-isolate work didn't go anywhere but the issue remains that blindly closing all libuv handles is a bad idea (as in 'segfault bad' and 'silent data corruption bad'.)

My suggestion would be to do nothing and sit it out. The old debugger is going away and the inspector doesn't have this issue.

@bnoordhuis
Copy link
Member

The old debugger has been removed. I'll close this out. Thanks for the PR though.

@bnoordhuis bnoordhuis closed this Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants