Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process,worker: fix process.exitCode handling for fatalException #21739

Closed
wants to merge 4 commits into from

Conversation

lundibundi
Copy link
Member

@lundibundi lundibundi commented Jul 10, 2018

  • set process.exitCode before calling 'exit' handlers so that there will
    not be a situation where process.exitCode !== code in 'exit' callback
    during uncaughtException handling

  • don't ignore process.exitCode set in 'exit' callback when failed with
    uncaughtException and there is no uncaughtException listener

  • fix duplicate call of 'exit' callbacks in case of uncaught exception in worker

Checklist
  • make -j4 test (UNIX) passes
  • tests and are included
  • documentation is changed or added (I'm not sure if this is reflected anywhere, so leaving unchecked for now)
  • commit message follows commit guidelines

This PR depends on #21713 for workers test.

I've found a bug in workers where in case on uncaughtException 'exit' event is actually called twice. I think this is due to both, having _fatalException called presumably via FatalException() and usual exit from worker as I understand here which results in 2 calls to 'exit' callbacks.
I have fixed it with second commit, but I'm not sure if it's a correct fix, so awaiting feedback.

Edit: The case for the bug is when worker exits with unhandled exception and there is an 'exit' event listener and no 'unhandledException' listeners. This way worker's local 'exit' callbacks will be called twice. (see test-worker-uncaught-exception.js). Current node 10.6.0 always fails on this.

/cc @addaleax

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jul 10, 2018
@@ -475,6 +475,7 @@
try {
if (!process._exiting) {
process._exiting = true;
process.exitCode = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user code happens to set process.exitCode prior to this, this will override the user provided value. It should likely only set the value if it is not already set

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe this is not the case, as uncaughtException has its own code 1, so it should be correct to override whatever was the code before. Though this is probably a semver-major because of this, but that shouldn't be a problem imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... I can definitely see the logic on it but I definitely don't like overriding user provided values unexpectedly. If we go with this, can I ask that a note be added to the documentation along with a code comment indicating why it's ok to silently override any user provided value here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me. But where should I put a documentation?
Also, it is already noted that unhandledException has code 1, so this will basically enforce this, so that any previously set code will not be used, only the ones set in 'unhandledException' callback, 'exit' callback etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But where should I put a documentation?

In the description for the uncaughtException event in docs/api/process.md. The documentation currently does not say anything about the process exit code. A note there that, should the event not be handled, the process will exit with exitCode = 1 even if the user had previously set a process.exitCode value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, I see this as a limitation in the current documentation and not something that is introduced by this PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if the user had previously set a process.exitCode value.

Oh, I thought that's it's a given that if some new error happen exitCode will be replaced with the value of the error and just thought that it was undocumented. The fact that you wasn't able to change the code is indeed a strange thing. Anyway, I'll add the doc soon.

src/node.cc Outdated
// read it again, otherwise use default for uncaughtException 1
Local<String> exit_code = env->exit_code_string();
int code = process_object->Get(env->context(), exit_code)
.ToLocalChecked()->Int32Value(env->context()).ToChecked();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 space indent in C/C++ code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thx. I thought linter would catch this 🤔.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi Sadly, our C++ linter is very outdated and not very actively maintained … we do have https://github.com/nodejs/node/blob/master/CPP_STYLE_GUIDE.md, if it helps?

src/node.cc Outdated
Local<String> exit_code = env->exit_code_string();
int code = process_object->Get(env->context(), exit_code)
.ToLocalChecked()->Int32Value(env->context()).ToChecked();
if (code == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this should likely only emit exit(1) if code is undefined. It should be legitimate for an uncaught exception handler to set process.exitCode = 0 and have that be the actual exit code.

@lundibundi
Copy link
Member Author

Also, I kind of don't like code duplication in both tests (for process and worker), so will it be okay to extract test cases (child1,2,3 etc) into test/fixtures?

src/node.cc Outdated
exit(1);
} else {
exit(code.ToLocalChecked()->Int32Value(env->context()).ToChecked());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crashes for weird edge cases like this:

process.on("exit", () => {
  process.exitCode = { [Symbol.toPrimitive]() { throw new Error(); } };
})

Maybe exit with 1 if code.IsEmpty() || !code->IsInt32() and use a direction conversion to v8::Int32 here, like this:

*result = val.As<v8::Int32>()->Value();

Does that sound okay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's pretty neat. Though, one question, is it safe to just Local<Value> code = process_object->Get(env->context(), exit_code).ToLocalChecked();? Or should I do it step by step (empty check, then convert and int32 check).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi That operation can still fail if userland code were to install a getter that throws … obviously not something it should do, but yes, I’d prefer to do it step-by-step.

(Generally, we’ll need to shift a lot of our error handling code away from using ToLocalChecked()/ToChecked(), because worker.terminate() can make just about anything throw that calls JS code…)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, I'll update it soon.

src/node.cc Outdated

Local<String> exit_code = env->exit_code_string();
int code = process_object->Get(env->context(), exit_code).ToLocalChecked()
->Int32Value(env->context()).ToChecked();

if (exiting->IsTrue()) {
return code;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you when you say the test was failing before, but do you know why this was happening? Does it point to a larger issue?

Copy link
Member Author

@lundibundi lundibundi Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the issue here is that there is no synchronization between worker-exit and worker-fatalException-exit (links at the end of the first post) and as a result we have

  • FatalException -> fatal callback called in js land -> emit exit from there
  • worker undestands that it is finishing, exits its loop and calls EmitExit

Therefore we get 2 calls to 'exit' listeners. I'm not sure if 'just muting' second exit event is a good solution, hence my note in the end.

I believe you when you say the test was failing before

I'm not sure what you mean? Also I forgot to add a description of a test case for the bug (it is here in the tests and I'll update the first post soon)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest ,you are really so konwledgable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi I see, that makes sense. I think I’d have a slight preference to handle this using a flag on the Environment object, since process._exiting can just be modified by userland code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's surely better. I'll try to implement that, though there may be some problems as _exiting is actually set from _fatalException handler in js, so with _exiting it is set if no handler and **before** 'exit' callback, but with env it will be set if no handler and **after** all 'exit' callbacks due to the way fatalException is handled in js.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax Well, as I was afraid of, worker finish coincides with FatalException (after FatalException calls js handler, worker understands that it has finished and starts its own sequence). So either we

  • change the way they work with each other (no ideas here, add exitCode lock in env and block worker until FatalExeption finish?)
  • change how _fatalException work (maybe split in pre that will check for uncaughtException handler and actually success and failure routes, though this looks kind of over the top and introduce additional cpp-js calls)
  • maybe check for 'uncaughtException' listeners in the cpp-land beforehand if that's possible, but this one looks like a hack and not a solution
  • or use _exiting for now

Obviously the latter looks okay, but this surely indicates the problem and may need additional research.

P.s: In order to investigate this I added a few logs in code (as Debug were not enough) and run a version on test-worker-uncaught-exception.js without asserts, here is the output

Fatal exception calling js handler
Worker exited main loop
Worker emit exit
EmitExit start
EmitExit emit 'exit'
on error received: foo
Exit callback called twice in worker
exited with 1
EmitExit start
EmitExit emit 'exit'

Aclually I added env->exiting and appropriate checks/sets in EmitExit, FatalException here but the order of execution prevents it from running correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi I think there’s a bigger bug here – in test-worker-uncaught-exception.js, the worker does not exit because of the uncaught exception, but because there’s no work to do after it. :/

I think we want a process.exit() call at the end of the if (!caught) { block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax oh, that is indeed true, great catch thanks.
I think adding a Timeout? with assert.fail should be good enough to ensure that the worker exits. Though I'm not sure if timeout is a good idea (flakiness and such), is there any better way?
Yeah process.exit() seems to work just fine there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi We could try to start some async operation before the uncaught throw and see whether it’s executed? That latter part might be tricky, but that would be the general idea, I think

@addaleax
Copy link
Member

Also, I kind of don't like code duplication in both tests (for process and worker), so will it be okay to extract test cases (child1,2,3 etc) into test/fixtures?

@lundibundi You can do that if it makes the most sense to you, yes :)

src/node.cc Outdated
Local<String> exit_code = env->exit_code_string();
MaybeLocal<Value> maybe_code =
process_object->Get(env->context(), exit_code);
if (maybe_code.IsEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really important, but I’m kinda hoping you’ll stick around after your currently open PRs, so as a tip: I personally find it slightly annoying to first have a MaybeLocal<Something> and then conditionally converting it to a Local<Value>.

What I tend to do then is something along these lines:

Local<Value> value;
if (!object->Get(context, key).ToLocal(&value) || !value->IsInt32())
  exit(1);
exit(value.As<Int32>()->Value());

I know it’s not an exact match to this situation but I hope it’s clear enough what I mean. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, MaybeLocal looks kind of limited. Though I like this neat way of using sce.

src/node.cc Outdated

Local<String> exit_code = env->exit_code_string();
int code = process_object->Get(env->context(), exit_code).ToLocalChecked()
->Int32Value(env->context()).ToChecked();

if (exiting->IsTrue()) {
return code;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi I see, that makes sense. I think I’d have a slight preference to handle this using a flag on the Environment object, since process._exiting can just be modified by userland code.

@lundibundi
Copy link
Member Author

@addaleax I've cleaned up the commits. I've left setTimeout for now, as I cannot come up right now and I don't want to delay this PR because of it. Though I'm open to suggestions.
Also I've run --repeat 1920 for both tests and it seemed to be fine. Furthermore using timeout here shouldn't be flaky as in case of throw it shouldn't really execute anything anymore.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really digging what you put together here. Thank you a lot!

Would you be interested in joining the @nodejs/workers team? It doesn’t come with any responsibilities, it’s just people who get notified for issues/PRs related to Worker code. :)

@addaleax
Copy link
Member

@addaleax addaleax added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. process Issues and PRs related to the process subsystem. worker Issues and PRs related to Worker support. labels Jul 13, 2018
* set process.exitCode before calling 'exit' handlers so that there will
  not be a situation where process.exitCode !== code in 'exit' callback
  during uncaughtException handling
* don't ignore process.exitCode set in 'exit' callback when failed with
  uncaughtException and there is no uncaughtException listener
Now we set it before the exit event, this allows to change the
code inside the exit event (event with uncaughtException), therefore
setting exitCode in worker is no longer needed.
Previously even after uncaught exception the worker would continue to
execute until there is no more work to do.
@lundibundi
Copy link
Member Author

@addaleax Thanks for your support 😃.
Yes please, I'd like to join.
I've rebased, should be good. Also it might be worth landing worker: exit after uncaught exception separately as it's not really related to 'process.exitCode handling'. Though this is just a bug fix so might not be that important.

@addaleax
Copy link
Member

Landed in 998f9ff...7c2925e, thanks for the work! 🎉

@addaleax addaleax closed this Jul 14, 2018
addaleax pushed a commit that referenced this pull request Jul 14, 2018
* set process.exitCode before calling 'exit' handlers so that there will
  not be a situation where process.exitCode !== code in 'exit' callback
  during uncaughtException handling
* don't ignore process.exitCode set in 'exit' callback when failed with
  uncaughtException and there is no uncaughtException listener

PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 14, 2018
PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 14, 2018
Previously even after uncaught exception the worker would continue to
execute until there is no more work to do.

PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
targos pushed a commit that referenced this pull request Jul 14, 2018
* set process.exitCode before calling 'exit' handlers so that there will
  not be a situation where process.exitCode !== code in 'exit' callback
  during uncaughtException handling
* don't ignore process.exitCode set in 'exit' callback when failed with
  uncaughtException and there is no uncaughtException listener

PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
targos pushed a commit that referenced this pull request Jul 14, 2018
PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
targos pushed a commit that referenced this pull request Jul 14, 2018
Previously even after uncaught exception the worker would continue to
execute until there is no more work to do.

PR-URL: #21739
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
@targos targos mentioned this pull request Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. process Issues and PRs related to the process subsystem. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants