Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

cluster/child_process exit event does not pass signal if child is killed #5832

Closed
sam-github opened this issue Jul 10, 2013 · 7 comments
Closed

Comments

@sam-github
Copy link

It appears to instead pass the raw status from wait(2) as code, and null for signal (143 == 0b10001111, the low 0b1111==15==SIGTERM).

node --version
v0.10.9-pre (head of v0.10 branch).

3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Output:

worker pid 23321
worker on exit 143 null

assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: null == "SIGTERM"

From

assert = require('assert')
cluster = require('cluster');
if(cluster.isMaster) {
  worker=cluster.fork();
  worker.on('exit', function(code, signal) {
    console.log('worker on exit', code, signal)
    assert.equal(signal, 'SIGTERM')
  })
  worker.on('online', function() {
    console.log('worker pid', worker.process.pid)
    worker.kill('SIGTERM')
  });
}
@bnoordhuis
Copy link
Member

That's because the SIGTERM is caught and translated into a graceful exit by node.js:

$ strace -fe exit_group,kill,wait4 out/Release/node tmp/issue5832.js
Process 25122 attached
Process 25123 attached
Process 25124 attached
worker pid 25123
[pid 25121] kill(25123, SIGTERM <unfinished ...>
[pid 25123] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=25121, si_uid=1000} ---
[pid 25121] <... kill resumed> )        = 0
[pid 25123] exit_group(143)             = ?
[pid 25124] +++ exited with 143 +++
[pid 25123] +++ exited with 143 +++
[pid 25121] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=25123, si_status=143, si_utime=3, si_stime=2} ---
[pid 25121] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 143}], WNOHANG, NULL) = 25123
worker on exit 143 null

assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: null == "SIGTERM"
    at Worker.<anonymous> (/home/bnoordhuis/src/v0.10/tmp/issue5832.js:7:12)
    at Worker.EventEmitter.emit (events.js:98:17)
    at ChildProcess.<anonymous> (cluster.js:340:10)
    at ChildProcess.g (events.js:175:14)
    at ChildProcess.EventEmitter.emit (events.js:98:17)
    at Process.ChildProcess._handle.onexit (child_process.js:789:12)
[pid 25121] exit_group(8)               = ?
[pid 25122] +++ exited with 8 +++
+++ exited with 8 +++

If you send SIGKILL (which cannot be caught), your test works as you'd expect it to.

@behrad
Copy link

behrad commented Nov 24, 2013

That's because the SIGTERM is caught and translated into a graceful exit by node.js

@bnoordhuis And what if the client code wants to do some graceful shutdown by SIGTERM in the forked child when it is still alive? Does node provide any hooks there?

@bnoordhuis
Copy link
Member

@behrad process.on('SIGTERM', ...)

@behrad
Copy link

behrad commented Nov 24, 2013

Thank you @bnoordhuis , but my scenario is populating SIGTERM from a master process to child workers. I am using cluster module and I did workers[i].kill( 'SIGTERM' ) to shut the children down when the master is killed, BUT I didn't receive the signal in my workers!!
In hours, I implemented a one-phase shutdown voting to exit the master when all workers are shut down by message passing between master-worker. workers[i].send( 'WORKER_SHUTDOWN_REQ' );
Any suggestions?

@sam-github
Copy link
Author

@bnoordhuis's suggestion won't work, because worker.kill first does a disconnect, and the disconnect probably allows the worker to exit... so it never gets your signal. You can call worker.process.kill() to bypass the disconnect, and just send the signal... but you have to be careful of race conditions

https://gist.github.com/7645421

Don't use .kill() if you want your worker to do something, its only reliable for terminating your worker, not delivering a signal. Use .send(), and then use kill if they don't exit sometime after getting the message you sent.

Or use strong-cluster-control!

@trevnorris
Copy link

Just want to make sure this is understood, you need to explicitly shutdown your process when you catch SIGTERM. e.g.:

process.on('SIGTERM', function() {
  // do stuff
  process.exit();
});

@sam-github
Copy link
Author

@trevnorris, yes, that is true... if you catch a signal you no longer get the default signal behaviour (termination in this case). But note original OP is talking about "not getting" signal in worker... which is probably because worker.kill() does a worker.disconnect() BEFORE doing a worker.process.kill(), and workers usually exit when their control channel is disconnected (don't have to, but usually), so the process is already exited by the time the signal is sent.

/to @piscisaureus, this and #6042 are the issues we were talking about yesterday, where the cluster worker.kill() is just broken.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants