process,worker: fix process.exitCode handling for fatalException #21739

lundibundi · 2018-07-10T18:52:29Z

set process.exitCode before calling 'exit' handlers so that there will
not be a situation where process.exitCode !== code in 'exit' callback
during uncaughtException handling
don't ignore process.exitCode set in 'exit' callback when failed with
uncaughtException and there is no uncaughtException listener
fix duplicate call of 'exit' callbacks in case of uncaught exception in worker

Checklist

make -j4 test (UNIX) passes
tests and are included
documentation is changed or added (I'm not sure if this is reflected anywhere, so leaving unchecked for now)
commit message follows commit guidelines

~~This PR depends on #21713 for workers test.~~

I've found a bug in workers where in case on uncaughtException 'exit' event is actually called twice. I think this is due to both, having _fatalException called presumably via FatalException() and usual exit from worker as I understand here which results in 2 calls to 'exit' callbacks.
I have fixed it with second commit, but I'm not sure if it's a correct fix, so awaiting feedback.

Edit: The case for the bug is when worker exits with unhandled exception and there is an 'exit' event listener and no 'unhandledException' listeners. This way worker's local 'exit' callbacks will be called twice. (see test-worker-uncaught-exception.js). Current node 10.6.0 always fails on this.

/cc @addaleax

jasnell · 2018-07-10T19:33:02Z

lib/internal/bootstrap/node.js

@@ -475,6 +475,7 @@
        try {
          if (!process._exiting) {
            process._exiting = true;
+            process.exitCode = 1;


If user code happens to set process.exitCode prior to this, this will override the user provided value. It should likely only set the value if it is not already set

I do believe this is not the case, as uncaughtException has its own code 1, so it should be correct to override whatever was the code before. Though this is probably a semver-major because of this, but that shouldn't be a problem imo.

hmm... I can definitely see the logic on it but I definitely don't like overriding user provided values unexpectedly. If we go with this, can I ask that a note be added to the documentation along with a code comment indicating why it's ok to silently override any user provided value here?

That's fine with me. But where should I put a documentation?
Also, it is already noted that unhandledException has code 1, so this will basically enforce this, so that any previously set code will not be used, only the ones set in 'unhandledException' callback, 'exit' callback etc.

But where should I put a documentation?

In the description for the uncaughtException event in docs/api/process.md. The documentation currently does not say anything about the process exit code. A note there that, should the event not be handled, the process will exit with exitCode = 1 even if the user had previously set a process.exitCode value.

(btw, I see this as a limitation in the current documentation and not something that is introduced by this PR)

even if the user had previously set a process.exitCode value.

Oh, I thought that's it's a given that if some new error happen exitCode will be replaced with the value of the error and just thought that it was undocumented. The fact that you wasn't able to change the code is indeed a strange thing. Anyway, I'll add the doc soon.

jasnell · 2018-07-10T19:33:42Z

src/node.cc

+      // read it again, otherwise use default for uncaughtException 1
+      Local<String> exit_code = env->exit_code_string();
+      int code = process_object->Get(env->context(), exit_code)
+        .ToLocalChecked()->Int32Value(env->context()).ToChecked();


nit: 4 space indent in C/C++ code

Oops, thx. I thought linter would catch this 🤔.

@lundibundi Sadly, our C++ linter is very outdated and not very actively maintained … we do have https://github.com/nodejs/node/blob/master/CPP_STYLE_GUIDE.md, if it helps?

jasnell · 2018-07-10T19:35:43Z

src/node.cc

+      Local<String> exit_code = env->exit_code_string();
+      int code = process_object->Get(env->context(), exit_code)
+        .ToLocalChecked()->Int32Value(env->context()).ToChecked();
+      if (code == 0) {


Hmm... this should likely only emit exit(1) if code is undefined. It should be legitimate for an uncaught exception handler to set process.exitCode = 0 and have that be the actual exit code.

lundibundi · 2018-07-10T21:01:10Z

Also, I kind of don't like code duplication in both tests (for process and worker), so will it be okay to extract test cases (child1,2,3 etc) into test/fixtures?

addaleax · 2018-07-10T22:52:37Z

src/node.cc

+        exit(1);
+      } else {
+        exit(code.ToLocalChecked()->Int32Value(env->context()).ToChecked());
+      }


This crashes for weird edge cases like this:

process.on("exit", () => { process.exitCode = { [Symbol.toPrimitive]() { throw new Error(); } }; })

Maybe exit with 1 if code.IsEmpty() || !code->IsInt32() and use a direction conversion to v8::Int32 here, like this:

node/src/node_api.cc

Line 2134 in 1f16758

*result = val.As<v8::Int32>()->Value();

Does that sound okay?

Wow, that's pretty neat. Though, one question, is it safe to just Local<Value> code = process_object->Get(env->context(), exit_code).ToLocalChecked();? Or should I do it step by step (empty check, then convert and int32 check).

@lundibundi That operation can still fail if userland code were to install a getter that throws … obviously not something it should do, but yes, I’d prefer to do it step-by-step.

(Generally, we’ll need to shift a lot of our error handling code away from using ToLocalChecked()/ToChecked(), because worker.terminate() can make just about anything throw that calls JS code…)

Ok, thanks, I'll update it soon.

addaleax · 2018-07-10T22:54:08Z

src/node.cc


  Local<String> exit_code = env->exit_code_string();
  int code = process_object->Get(env->context(), exit_code).ToLocalChecked()
      ->Int32Value(env->context()).ToChecked();

+  if (exiting->IsTrue()) {
+    return code;
+  }


I believe you when you say the test was failing before, but do you know why this was happening? Does it point to a larger issue?

As I understand the issue here is that there is no synchronization between worker-exit and worker-fatalException-exit (links at the end of the first post) and as a result we have

FatalException -> fatal callback called in js land -> emit exit from there

worker undestands that it is finishing, exits its loop and calls EmitExit

Therefore we get 2 calls to 'exit' listeners. I'm not sure if 'just muting' second exit event is a good solution, hence my note in the end.

I believe you when you say the test was failing before

I'm not sure what you mean? Also I forgot to add a description of a test case for the bug (it is here in the tests and I'll update the first post soon)

to be honest ,you are really so konwledgable

@lundibundi I see, that makes sense. I think I’d have a slight preference to handle this using a flag on the Environment object, since process._exiting can just be modified by userland code.

Oh, that's surely better. I'll try to implement that, though there may be some problems as _exiting is actually set from _fatalException handler in js, so with _exiting it is set if no handler and **before** 'exit' callback, but with env it will be set if no handler and **after** all 'exit' callbacks due to the way fatalException is handled in js.

@addaleax Well, as I was afraid of, worker finish coincides with FatalException (after FatalException calls js handler, worker understands that it has finished and starts its own sequence). So either we

change the way they work with each other (no ideas here, add exitCode lock in env and block worker until FatalExeption finish?)

change how _fatalException work (maybe split in pre that will check for uncaughtException handler and actually success and failure routes, though this looks kind of over the top and introduce additional cpp-js calls)

maybe check for 'uncaughtException' listeners in the cpp-land beforehand if that's possible, but this one looks like a hack and not a solution

or use _exiting for now

Obviously the latter looks okay, but this surely indicates the problem and may need additional research.

P.s: In order to investigate this I added a few logs in code (as Debug were not enough) and run a version on test-worker-uncaught-exception.js without asserts, here is the output

Fatal exception calling js handler Worker exited main loop Worker emit exit EmitExit start EmitExit emit 'exit' on error received: foo Exit callback called twice in worker exited with 1 EmitExit start EmitExit emit 'exit'

Aclually I added env->exiting and appropriate checks/sets in EmitExit, FatalException here but the order of execution prevents it from running correctly.

@lundibundi I think there’s a bigger bug here – in test-worker-uncaught-exception.js, the worker does not exit because of the uncaught exception, but because there’s no work to do after it. :/

I think we want a process.exit() call at the end of the if (!caught) { block?

@addaleax oh, that is indeed true, great catch thanks.
I think adding a Timeout? with assert.fail should be good enough to ensure that the worker exits. Though I'm not sure if timeout is a good idea (flakiness and such), is there any better way?
Yeah process.exit() seems to work just fine there.

@lundibundi We could try to start some async operation before the uncaught throw and see whether it’s executed? That latter part might be tricky, but that would be the general idea, I think

addaleax · 2018-07-10T22:58:14Z

Also, I kind of don't like code duplication in both tests (for process and worker), so will it be okay to extract test cases (child1,2,3 etc) into test/fixtures?

@lundibundi You can do that if it makes the most sense to you, yes :)

addaleax · 2018-07-11T17:15:25Z

src/node.cc

+      Local<String> exit_code = env->exit_code_string();
+      MaybeLocal<Value> maybe_code =
+          process_object->Get(env->context(), exit_code);
+      if (maybe_code.IsEmpty()) {


This is not really important, but I’m kinda hoping you’ll stick around after your currently open PRs, so as a tip: I personally find it slightly annoying to first have a MaybeLocal<Something> and then conditionally converting it to a Local<Value>.

What I tend to do then is something along these lines:

Local<Value> value; if (!object->Get(context, key).ToLocal(&value) || !value->IsInt32()) exit(1); exit(value.As<Int32>()->Value());

I know it’s not an exact match to this situation but I hope it’s clear enough what I mean. :)

Well, MaybeLocal looks kind of limited. Though I like this neat way of using sce.

addaleax · 2018-07-11T17:17:25Z

src/node.cc


  Local<String> exit_code = env->exit_code_string();
  int code = process_object->Get(env->context(), exit_code).ToLocalChecked()
      ->Int32Value(env->context()).ToChecked();

+  if (exiting->IsTrue()) {
+    return code;
+  }


@lundibundi I see, that makes sense. I think I’d have a slight preference to handle this using a flag on the Environment object, since process._exiting can just be modified by userland code.

lundibundi · 2018-07-13T16:27:36Z

@addaleax I've cleaned up the commits. I've left setTimeout for now, as I cannot come up right now and I don't want to delay this PR because of it. Though I'm open to suggestions.
Also I've run --repeat 1920 for both tests and it seemed to be fine. Furthermore using timeout here shouldn't be flaky as in case of throw it shouldn't really execute anything anymore.

addaleax

Really digging what you put together here. Thank you a lot!

Would you be interested in joining the @nodejs/workers team? It doesn’t come with any responsibilities, it’s just people who get notified for issues/PRs related to Worker code. :)

addaleax · 2018-07-13T16:36:09Z

Full CI: https://ci.nodejs.org/job/node-test-pull-request/15854/

* set process.exitCode before calling 'exit' handlers so that there will not be a situation where process.exitCode !== code in 'exit' callback during uncaughtException handling * don't ignore process.exitCode set in 'exit' callback when failed with uncaughtException and there is no uncaughtException listener

Now we set it before the exit event, this allows to change the code inside the exit event (event with uncaughtException), therefore setting exitCode in worker is no longer needed.

Previously even after uncaught exception the worker would continue to execute until there is no more work to do.

lundibundi · 2018-07-13T19:17:26Z

@addaleax Thanks for your support 😃.
Yes please, I'd like to join.
I've rebased, should be good. Also it might be worth landing worker: exit after uncaught exception separately as it's not really related to 'process.exitCode handling'. Though this is just a bug fix so might not be that important.

addaleax · 2018-07-14T10:16:10Z

Landed in 998f9ff...7c2925e, thanks for the work! 🎉

* set process.exitCode before calling 'exit' handlers so that there will not be a situation where process.exitCode !== code in 'exit' callback during uncaughtException handling * don't ignore process.exitCode set in 'exit' callback when failed with uncaughtException and there is no uncaughtException listener PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

Previously even after uncaught exception the worker would continue to execute until there is no more work to do. PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

* set process.exitCode before calling 'exit' handlers so that there will not be a situation where process.exitCode !== code in 'exit' callback during uncaughtException handling * don't ignore process.exitCode set in 'exit' callback when failed with uncaughtException and there is no uncaughtException listener PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

Previously even after uncaught exception the worker would continue to execute until there is no more work to do. PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jul 10, 2018

jasnell reviewed Jul 10, 2018

View reviewed changes

lundibundi force-pushed the fix-env-exit-code branch from 9109a57 to 1acb425 Compare July 10, 2018 22:31

addaleax approved these changes Jul 10, 2018

View reviewed changes

lundibundi force-pushed the fix-env-exit-code branch from 1acb425 to 6ce385b Compare July 11, 2018 00:31

addaleax reviewed Jul 11, 2018

View reviewed changes

lundibundi force-pushed the fix-env-exit-code branch from 6ce385b to 773c9d8 Compare July 11, 2018 19:46

BridgeAR approved these changes Jul 12, 2018

View reviewed changes

lundibundi force-pushed the fix-env-exit-code branch from 773c9d8 to 5eb1a23 Compare July 13, 2018 16:21

addaleax approved these changes Jul 13, 2018

View reviewed changes

addaleax added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. process Issues and PRs related to the process subsystem. worker Issues and PRs related to Worker support. labels Jul 13, 2018

lundibundi added 4 commits July 13, 2018 22:02

test: refactor process/worker exitCode tests

4df2d6b

fixup! worker: remove setting exitCode after uncaughtException/exit

7ee6da0

Now we set it before the exit event, this allows to change the code inside the exit event (event with uncaughtException), therefore setting exitCode in worker is no longer needed.

worker: exit after uncaught exception

6a993f1

Previously even after uncaught exception the worker would continue to execute until there is no more work to do.

lundibundi force-pushed the fix-env-exit-code branch from 5eb1a23 to 6a993f1 Compare July 13, 2018 19:14

addaleax closed this Jul 14, 2018

addaleax pushed a commit that referenced this pull request Jul 14, 2018

test: refactor process/worker exitCode tests

19e10ec

PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

targos pushed a commit that referenced this pull request Jul 14, 2018

test: refactor process/worker exitCode tests

600349a

PR-URL: #21739 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

targos mentioned this pull request Jul 17, 2018

v10.7.0 proposal #21851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process,worker: fix process.exitCode handling for fatalException #21739

process,worker: fix process.exitCode handling for fatalException #21739

lundibundi commented Jul 10, 2018 •

edited

Loading

jasnell Jul 10, 2018

lundibundi Jul 10, 2018

jasnell Jul 10, 2018

lundibundi Jul 11, 2018

jasnell Jul 11, 2018

jasnell Jul 11, 2018

lundibundi Jul 11, 2018

jasnell Jul 10, 2018

lundibundi Jul 10, 2018

addaleax Jul 10, 2018

jasnell Jul 10, 2018

lundibundi commented Jul 10, 2018

addaleax Jul 10, 2018

lundibundi Jul 10, 2018

addaleax Jul 11, 2018

lundibundi Jul 11, 2018

addaleax Jul 10, 2018

lundibundi Jul 11, 2018 •

edited

Loading

YouthJourney Jul 11, 2018

addaleax Jul 11, 2018

lundibundi Jul 11, 2018

lundibundi Jul 11, 2018

addaleax Jul 13, 2018

lundibundi Jul 13, 2018

addaleax Jul 13, 2018

addaleax commented Jul 10, 2018

addaleax Jul 11, 2018

lundibundi Jul 11, 2018

addaleax Jul 11, 2018

lundibundi commented Jul 13, 2018

addaleax left a comment

addaleax commented Jul 13, 2018

lundibundi commented Jul 13, 2018

addaleax commented Jul 14, 2018

process,worker: fix process.exitCode handling for fatalException #21739

process,worker: fix process.exitCode handling for fatalException #21739

Conversation

lundibundi commented Jul 10, 2018 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lundibundi commented Jul 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lundibundi Jul 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addaleax commented Jul 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lundibundi commented Jul 13, 2018

addaleax left a comment

Choose a reason for hiding this comment

addaleax commented Jul 13, 2018

lundibundi commented Jul 13, 2018

addaleax commented Jul 14, 2018

lundibundi commented Jul 10, 2018 •

edited

Loading

lundibundi Jul 11, 2018 •

edited

Loading