Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wait for all child processes to terminate (fixes issue #1476) #1579

Closed
wants to merge 1 commit into from

Conversation

marcelkottmann
Copy link
Contributor

@marcelkottmann marcelkottmann commented Jun 7, 2019

See #1476

@remy
Copy link
Owner

remy commented Jun 7, 2019

Is there a test for this that shows it failing and then passing with the change?

@remy
Copy link
Owner

remy commented Jun 7, 2019

Also, how is this different to the current kill logic?

@marcelkottmann
Copy link
Contributor Author

Also, how is this different to the current kill logic?

The current kill logic is unchanged. After issuing the kill command in run.js:376 (which only sends signals to the other processes), the processes may need some time finishing.
The new command kill -0 actually does not send a signal, it only reports through the exit code, wether some given PIDs still exists, i.e. the process has not terminated. The new logic waits until every child process is really terminated. The current logic only waits for the parent child process (the sh command) to finish (the 'exit' event).

@remy
Copy link
Owner

remy commented Jun 7, 2019

And I assume this doesn't apply to windows? (I see Windows is being skipped)

@marcelkottmann
Copy link
Contributor Author

And I assume this doesn't apply to windows? (I see Windows is being skipped)

As mentioned in my comment in #1476 I assume that the issue is only occuring on linux platform.

@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

@stale stale bot added stale no activity for 2 weeks and removed stale no activity for 2 weeks labels Jun 21, 2019
@Cerber-Ursi
Copy link

@PePe79 Could you please look through the Travis error? I'm not sure if it is possible to fix it otherwise.

@marcelkottmann
Copy link
Contributor Author

@Cerberuser yes I will have a look within the next 2 days.

@marcelkottmann
Copy link
Contributor Author

Reworked this and moved the waitForOldKids function inside the kill function. I changed the behaviour in the way, that at first all kids are killed and only if all kids have been killed successfully (wait for it), killing also the parent process (the sh-process). This triggers the exit-event like before and restarts the monitored exec command.

@marcelkottmann marcelkottmann changed the title fix: wait for _all_ child processes to terminate (fixes issue #1476) fix: wait for all child processes to terminate (fixes issue #1476) Jun 28, 2019
@dobesv
Copy link

dobesv commented Jul 3, 2019

I think this may address the issue but it spams the console with messages "waiting for 1 child process to exit".

In our system we actually never exit when we receive the signal, we do a hot reload and ignore the signal. I think nodemon should allow this without spamming the console or using a lot of CPU cycles.

@marcelkottmann
Copy link
Contributor Author

marcelkottmann commented Jul 3, 2019

hey @dobesv , ok that sounds interesting. I did not thought about this use-case. It's conceptually the opposite of what is addressed in the issue #1476. I don't think that both use-cases

  1. wait until the process has really terminated and then restart it

and

  1. custom signalhandler without terminating the process

work together without a specific command line switch that tells nodemon what to do.

@remy What do you think about theses use-cases?

} else {
// make sure we kill from smallest to largest
const pids = kids.concat(child.pid).sort();
const pids = kids.sort();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mutating your parameter here, which is kind of a no-no. Perhaps copy the array first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I fixed that by using .slice() now.

@@ -16,6 +16,27 @@ var psTree = require('pstree.remy');
var path = require('path');
var signals = require('./signals');

function waitForOldKids(oldKids, callback) {
if (oldKids) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check oldKids.length > 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed this line to if (Array.isArray(oldKids) && oldKids.length > 0) {

spawn('kill', ['-s', sig, child.pid].concat(kids))
.on('close', callback);
spawn('kill', ['-s', sig].concat(kids))
.on('close', waitForOldKids.bind(this, kids, function () {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we filter out child.pid from kids here, we can just rely on the exit event from ChildProcess to detect the root process exit, which avoids the extra polling in the case that there's no extra child processes. e.g. kids.filter(p => p !== child.pid).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you're meaning. But some commands won't create any sub processes. Maybe it is esoteric but for example nodemon --exec "read VAR" won't start a subprocess because read is a shell builtin command.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read VAR will start a shell subprocess, although there will be just one subprocess from nodemon's perspective. But in that case the exit event will be sent to us for that process so we don't need to poll for it to exit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to clarify what I'm trying to say. nodemon --exec "read VAR" will at least start /bin/sh to run the read built-in. So there's still a child process there, but only one (/bin/sh). nodemon --exec "python foo.py" runs two subprocesses - /bin/sh and python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you think kids contains also the sh process PID. I thought it doesn't (because of the previous old code in line 346: ..., child.pid].concat(kids)). I have to clarify what is right here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually I might have been confused about this one, I did a quick test and it seems like pstree is not including the parent id in the result.

e.g.

> child = require('child_process').exec('sleep 300s')
ChildProcess {
> child.pid
22955
> require('pstree.remy')(child.pid, (e, k) => console.log(k))
undefined
> [ 22956 ]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted some of my erroneous comments about this...

@dobesv
Copy link

dobesv commented Jul 3, 2019

The current release of nodemon supports both scenarios perfectly, as long as there is only one child process. I think a few simple changes will make your change more compatible with this:

  1. Poll the children less often to avoid heavy CPU use (once per second is probably fine)
  2. Don't poll the child.pid, since node will notify us of its exit via the event system. If there are no extra child processes the polling won't run at all, and your change won't affect users who don't need it.
  3. Don't log on every poll; maybe just log if the number of remaining processes changed from the last time (or don't log at all)

exec('kill -0 ' + oldKids.join(' '), (error) => {
const returnCode = error ? error.code : 0;
if (returnCode < 126) { // ignore command not found error
const stillRunningKids = oldKids.length - returnCode;
Copy link

@dobesv dobesv Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run kill it always returns 0 or 1 so I'm not sure if this math is reliable for cases where there are multiple child processes. You might have to run psTree again for each child to see whether it is still running and whether it has added any new child processes.

Copy link

@dobesv dobesv Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what you could do is just poll kids[0]. If it isn't running any more, call waitForOldKids(oldKids.slice(1), callback). Then when oldKids.length === 0 all the kids are done.

Instead of using kill to poll you could use psTree. It'll return an empty list if the process has exited, but if the list is not empty you can union that with the list you have so that you can detect any new child processes that may have been spawned.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using psTree to wait for the child process might work in windows as well, where kill is not available.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since psTree doesn't return the pid given as part of the list, you can't use it to tell whether a process is still running.

Copy link
Contributor Author

@marcelkottmann marcelkottmann Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this part of code I only want to wait for the subprocesses to terminate. I think using pstree here instead of kill -0 is a good idea. I will try that out.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, if you pass in the root pid you can just us pstree to check if there are any children or not instead of trying to kill each child.

if (stillRunningKids > 0) {
utils.log.status('still waiting for ' + stillRunningKids +
' child process(es) to finish...');
setTimeout(waitForOldKids.bind(this, oldKids, callback), 100);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slower polling interval might be nice for processes that aren't actually planning to exit.

@marcelkottmann marcelkottmann force-pushed the master branch 2 times, most recently from 760d1d9 to f6c2841 Compare July 3, 2019 18:49
@marcelkottmann
Copy link
Contributor Author

The current release of nodemon supports both scenarios perfectly, as long as there is only one child process.

I think your assumption is wrong, because nodemon itself starts a sh process, which then starts the provided exec command. So the "normal" usage of nodemon with launching one simple command, launches at least two processes. The exit-handler of nodemon only handles the exit of the sh command which does not gurantee that the subprocess (the exec command) also has terminated (at least on linux).

@dobesv
Copy link

dobesv commented Jul 3, 2019

The current release of nodemon supports both scenarios perfectly, as long as there is only one child process.

I think your assumption is wrong, because nodemon itself starts a sh process

If you run nodemon bla.js it doesn't run sh, it just forks a new node process and passes the script to it directly. In that case the bug(s) you are trying to fix here do not manifest because there is only a single child process.

Currently these bugs don't affect me because I only ran into them while trying pass arguments to node when running with nodemon. This triggered it to use sh as an intermediate process to pass the arguments. To resolve this I now pass arguments to node via environment variables instead of arguments through nodemon. Thus the issues at hand do not currently affect my project.

Your fix, were it to be merged, might affect me if it changes the behavior of the case that currently works correctly for me (single child process).

@stale
Copy link

stale bot commented Aug 21, 2019

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

@stale stale bot added the stale no activity for 2 weeks label Aug 21, 2019
@Cerber-Ursi
Copy link

@PePe79, is there any progress, or the PR is really stale?

@stale stale bot removed the stale no activity for 2 weeks label Aug 21, 2019
@marcelkottmann
Copy link
Contributor Author

No progress from my side, because of lack of time. Next steps will be looking at some of the findings of @dobesv and thinking about how to create a testcase that shows the errorcase. Unfortunately there will be no progress from my side in the next few weeks.

@stale
Copy link

stale bot commented Sep 6, 2019

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

@Cerber-Ursi
Copy link

Hope this issue isn't stale in fact.

@stale
Copy link

stale bot commented Sep 20, 2019

This issue has been automatically marked as idle and stale because it hasn't had any recent activity. It will be automtically closed if no further activity occurs. If you think this is wrong, or the problem still persists, just pop a reply in the comments and @remy will (try!) to follow up.
Thank you for contributing <3

@stale stale bot added the stale no activity for 2 weeks label Sep 20, 2019
@orinokai
Copy link

Eagerly awaiting this PR, thanks for everyone's efforts so far. Definitely not stale, issue persists.

@stale stale bot removed the stale no activity for 2 weeks label Sep 20, 2019
@remy remy added not-stale Tell stalebot to ignore this issue waiting for changes labels Sep 28, 2019
@Remzi1993
Copy link

I'm also waiting for this. @remy Could you check if all is okay?
It seems like it passed all checks.

@remy
Copy link
Owner

remy commented Oct 24, 2019

@Remzi1993 & @PePe79 I'm blocking out time these next few days for nodemon bits - I had one request on this code (I'll add as review comment too), can we rename kids to subprocess (I have infant mortality in my personal life and I'd like to keep it out of my code - there's no reason you might know, but if we can rename, then I'll start a detailed review and test and should get it merged and live this week).

@Remzi1993
Copy link

Hi @remy
I'm so sorry to hear this. Maybe I can help renaming everything, I'm also for logical names and I also avoid in my coding analogies like this to avoid confusing what it actually might mean.

I will take a look at the code this sunday (but I can't promise anything, because I'm very busy with a course/academy till the end of November).

Kind regards,
@Remzi1993

@remy
Copy link
Owner

remy commented Oct 24, 2019 via email

@marcelkottmann
Copy link
Contributor Author

Hi @remy , I am sorry for you. That's totally understandable and renaming shouldn't be a problem. Sorry that I did not have the time to create a testcase that can be automated. I had some ideas to test this inside docker containers, but hadn't any time to try that...
I will rename the variables and push the changes within the next two hours.

@marcelkottmann
Copy link
Contributor Author

I also created a git repo with a testcase to reproduce this issue and to test this fix (see README): https://github.com/pepe79/test-nodemon-issue-1476

@remy
Copy link
Owner

remy commented Oct 30, 2019

Picking this up this week - thank you @PePe79 for the changes.

@remy
Copy link
Owner

remy commented Nov 20, 2019

Merged and going live in PR #1632

@remy
Copy link
Owner

remy commented Nov 21, 2019

I'm in the process or looking at #1633 in detail because this change isn't working as expected in Ubuntu with a standard setup.

I think the problem is when there's more than one sub process, the kill -0 1234 6789 runs and it gives an error of code: 1 indicating that one or more of the processes are stopped, assuming both have stopped, the maths is then wrong, subProcesses.length - returnCode # 2 - 1.

The logic should either run this function for each sub process or it should revalidate whether the processes are running with pstree. I need to look back through the comments here as to why pstree wasn't used though.

@marcelkottmann
Copy link
Contributor Author

@remy , sorry didn't see that coming. On my system kill -0 returns exit code 2 if both given pids terminated...

> kill -0 4567 6789
kill: kill 4567 failed: no such process
kill: kill 6789 failed: no such process
> echo $?
2

Using pstree is the much better cross-platform and portable way. Sorry for the inconvience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not-stale Tell stalebot to ignore this issue waiting for changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants