Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm: fix race condition with timeout param #13074

Closed
wants to merge 2 commits into from
Closed

Conversation

laverdet
Copy link
Contributor

This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox.

This also fixes the displayErrors parameter in the runIn... family of functions.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • vm
Notes

The code dealing with displayErrors was dead. When this module was first written it looked like this:

if (try_catch.HasCaught() && try_catch.HasTerminated()) {
  V8::CancelTerminateExecution(args.GetIsolate());
  return ThrowError("Script execution timed out.");
}

if (result.IsEmpty()) {
  // Error occurred during execution of the script.
  DisplayExceptionLine(try_catch.Message());
  try_catch.ReThrow();
  return;
}

But 55b87c0 modified the first if in such a way that the second if would never be true. So I went ahead and fixed that while I was in there, but that caused a bunch of errors in the repl tests because the output changed. I disabled displayErrors in the repl because the output isn't too useful in that context anyway.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. repl Issues and PRs related to the REPL subsystem. vm Issues and PRs related to the vm subsystem. labels May 17, 2017
@addaleax addaleax self-requested a review May 17, 2017 12:21
@matt-kruse
Copy link

I just came here looking for someone talking about the problem I'm experiencing. See the code below - timeout exceptions only get caught intermittently. Is this a symptom of the problem you fixed, or is this a different issue?

const vm = require('vm');
const timeout = new vm.Script(`while(1){console.log('x');}`);
try {
	timeout.runInNewContext( {"console":console} , {timeout:10} );
}
catch(e) {
	// This should catch a timeout exception, but instead Node just exits.
	// It catches about 5% of the runs successfully.
	// if I remove the try/catch, the exception gets thrown all the way up.
	console.log(e);
}
console.log("done");

@laverdet
Copy link
Contributor Author

laverdet commented May 17, 2017

@matt-kruse yes your issue is fixed by this patch :)

edit: wait no that may be a different issue

edit2: I have an idea of what is causing your problem. The timeout parameter will always just terminate execution of the currently running JS code, even if you are called into core nodejs code. After a quick look at _stream_writable.js I found this:

  else if (isBuf || validChunk(this, state, chunk, cb)) {
    state.pendingcb++;
    ret = writeOrBuffer(this, state, isBuf, chunk, encoding, cb);
  }

What happens if execution is terminated right after state.pendingcb++ but before writeOfBuffer(...)? You'll have a writable stream in a bad state. This might not be the exact issue causing the stream to break, but it shows how there could be a problem. There's probably countless cases of this all over the nodejs codebase, termination of execution is violating several invariants expected in those libraries.

This modification to your test demonstrates that the problem is in the writable stream library:

const vm = require('vm');
const timeout = new vm.Script(`while(1){console.log('x');}`);
try {
  timeout.runInNewContext( {"console":console} , {timeout:10} );
}
catch(e) {
  // Now this will always show the caught error
  console.error(e);
}
console.log('done'); // This sometimes will not log

Note the change of console.log -> console.error. What happens is when you terminate execution while inside of console.log it permanently destroys process.stdout. I suspect this will be a wontfix but documentation about the flaw might be a good idea.

@rodoabad
Copy link

@laverdet is that because console.error outputs to stderr while console.log outputs to stdout?

@matt-kruse
Copy link

Fantastic insight, thank you. Luckily my actual use case doesn't depend on console or stdout - this was a manufactured test case to play with vm timeouts, and I happened to use console.log().

I'd be curious to know what other landmines might exist in Node when terminating execution abruptly, but I assume as long as my vm code isn't hooking into shared streams or objects, I should be pretty safe.

@laverdet
Copy link
Contributor Author

@rodoabad yeah sorry I should have made that clear. console.error uses a different WritableStream instance so it still works. You could also still write to stdout using: require('fs').write(1, e.stack+ '\n', function() {});

@artch
Copy link

artch commented May 18, 2017

We use this patch in production in this project. I can confirm that applying it removes a lot of random freezes on vm code timeouts.

@@ -231,7 +231,7 @@ function REPLServer(prompt,
try {
try {
const scriptOptions = {
displayErrors: true,
displayErrors: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could split the displayErrors changes out into a separate commit?

// or so. A better, but still incorrect implementation would fail after
// 100 seconds or so. If you're messing with vm timeouts you might
// consider increasing this timeout to hammer out races.
const giveUp = Date.now() + 2500;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests that are run in our CI shouldn’t generally take more than 1 second to finish… can you split this out into its own test file and reduce the timeout here?

This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.
@laverdet
Copy link
Contributor Author

Split displayErrors fix out, and added a test for it. I also added a new line to the decorated stack because that matches the behavior of src/node.cc:1738 which is the other place stacks get a fancy arrow added.

Put the vm race test under the pummel directory because that seems like the right place?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, thanks for the PR!

Split displayErrors fix out, and added a test for it.

Thank you!

Put the vm race test under the pummel directory because that seems like the right place?

It is, yes… Unfortunately, we don’t run those tests in CI, but I guess here we just have to live with it.

Fresh CI: https://ci.nodejs.org/job/node-test-commit/10012/

Local<String> decorated_stack = String::Concat(
String::Concat(arrow.As<String>(),
FIXED_ONE_BYTE_STRING(env->isolate(), "\n")),
stack.As<String>());
Copy link
Member

@addaleax addaleax May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny style nit, but we prefer 4 spaces of indentation for statement continuations – that can be fixed while merging, though

@@ -956,27 +958,20 @@ class ContextifyScript : public BaseObject {
bool timed_out = false;
bool received_signal = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a previous version of this PR these were volatile… any reason why that changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I meant to leave a note about that. The volatile ended up being unnecessary. The watchdogs take a non-const reference so that's good enough to ensure the optimizer doesn't do anything fishy. Then in the dtor uv_thread_join is called which will synchronize the watchdog thread back up with the main thread.

explicit Watchdog(
v8::Isolate* isolate,
uint64_t ms,
bool* timed_out = NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for 4-space indents. I think here we’d usually keep the v8::Isolate* isolate bit directly after the ( and align all other parameters vertically below it; but again, this is just a tiny nit and can be fixed up while landing.

explicit SigintWatchdog(v8::Isolate* isolate);
explicit SigintWatchdog(
v8::Isolate* isolate,
bool* received_signal = NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto)

// NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
// DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
// USE OR OTHER DEALINGS IN THE SOFTWARE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the copyright header for newly added files, I think.

@laverdet
Copy link
Contributor Author

With regard to the style changes is there a style guide out there for newcomers? The jslinter included was pretty helpful but in C++ land I found a lot of clashing styles so I just winged it. Though now that you point it out the 4-space continuation thing seems obvious.

@addaleax
Copy link
Member

With regard to the style changes is there a style guide out there for newcomers?

Nothing except the Google styleguide which we roughly follow, but tbh I wouldn’t expect anybody to read the full text for making a contribution (e.g. I haven’t done that myself). We could probably tighten up the linter, but it doesn’t usually take people long to get used to the C++ style quirks that we have; I think the 4-space indents for continuations, lower_snake_case for local variables and preferring to align arguments vertically are the only things that trip up people on a regular basis.

Though now that you point it out the 4-space continuation thing seems obvious.

It took me a while to get used to it as well ;)

@addaleax
Copy link
Member

Landed in 3c5bfba and e018c33, fixed up the left nits while landing. Thank you a lot for the PR!

@addaleax addaleax closed this May 23, 2017
addaleax pushed a commit that referenced this pull request May 23, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
addaleax pushed a commit that referenced this pull request May 23, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
jasnell pushed a commit that referenced this pull request May 24, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
jasnell pushed a commit that referenced this pull request May 24, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
kunalspathak added a commit to kunalspathak/node-chakracore that referenced this pull request May 25, 2017
* test_constructor: Napi test was failing because of over specifiying
a not condition. Fixed that and also restructured the code to not
iterate over list of properties twice.

* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074 changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked this as flaky for now.

* test-bindings: Marked this as flaky for now.
kunalspathak added a commit to kunalspathak/node-chakracore that referenced this pull request May 25, 2017
* test_constructor: Napi test was failing because of over specifiying
a not condition. Fixed that and also restructured the code to not
iterate over list of properties twice.

* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074 changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked this as flaky for now.

* test-bindings: Marked this as flaky for now.
kunalspathak added a commit to kunalspathak/node-chakracore that referenced this pull request May 25, 2017
* test_constructor: Napi test was failing because of over specifiying
a not condition. Fixed that and also restructured the code to not
iterate over list of properties twice.

* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074 changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked this as flaky for now.

* test-bindings: Marked this as flaky for now.
kunalspathak added a commit to kunalspathak/node-chakracore that referenced this pull request May 25, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked this as flaky for now.

* test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
kunalspathak added a commit to kunalspathak/node-chakracore that referenced this pull request May 25, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked below test as flaky.
* parallel/test-vm-sigint : Marked this as flaky for now.
* parallel/test-vm-sigint-existing-handler : Marked this as flaky for now.
* parallel/test-repl-sigint : Marked this as flaky for now.

* parallel/test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
jasnell pushed a commit that referenced this pull request May 28, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
jasnell pushed a commit that referenced this pull request May 28, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
@jasnell jasnell mentioned this pull request May 28, 2017
kfarnung pushed a commit to kfarnung/node-chakracore that referenced this pull request May 31, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked below test as flaky.
* parallel/test-vm-sigint : Marked this as flaky for now.
* parallel/test-vm-sigint-existing-handler : Marked this as flaky for now.
* parallel/test-repl-sigint : Marked this as flaky for now.

* parallel/test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
kfarnung pushed a commit to kfarnung/node-chakracore that referenced this pull request May 31, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked below test as flaky.
* parallel/test-vm-sigint : Marked this as flaky for now.
* parallel/test-vm-sigint-existing-handler : Marked this as flaky for now.
* parallel/test-repl-sigint : Marked this as flaky for now.

* parallel/test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
kfarnung pushed a commit to kfarnung/node-chakracore that referenced this pull request May 31, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked below test as flaky.
* parallel/test-vm-sigint : Marked this as flaky for now.
* parallel/test-vm-sigint-existing-handler : Marked this as flaky for now.
* parallel/test-repl-sigint : Marked this as flaky for now.

* parallel/test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
kfarnung pushed a commit to kfarnung/node-chakracore that referenced this pull request Jun 5, 2017
* test-vm-timeout-rethrow: This started failing after
nodejs/node#13074/ changes upstream. The change is
to not do `GetAndClearException` but allocate a string (`JsCreateString`)
which throws because engine is in exception state. We need to figure out
better way to handle these situation. Marked below test as flaky.
* parallel/test-vm-sigint : Marked this as flaky for now.
* parallel/test-vm-sigint-existing-handler : Marked this as flaky for now.
* parallel/test-repl-sigint : Marked this as flaky for now.

* parallel/test-bindings: Marked this as flaky for now.

PR-URL: nodejs#264
Reviewed-By: Kyle Farnung <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
@gibfahn gibfahn mentioned this pull request Jun 15, 2017
3 tasks
@MylesBorins
Copy link
Contributor

MylesBorins commented Jul 17, 2017

Should this land on v6.x?
If so it needs to be manually backported

@laverdet laverdet deleted the race branch July 17, 2017 16:21
@laverdet
Copy link
Contributor Author

@MylesBorins the patches apply cleanly to the v6.11.1 tag for me, except for the output of a test which is a simple fix. Did you run into other issues?

@MylesBorins
Copy link
Contributor

@laverdet that seems to be the conflict. Unfortunately the fix is not so simple for me to sort out... could you provide a Backport PR please

laverdet added a commit to laverdet/node that referenced this pull request Jul 19, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

PR-URL: nodejs#13074
Reviewed-By: Anna Henningsen <[email protected]>
laverdet added a commit to laverdet/node that referenced this pull request Jul 19, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

PR-URL: nodejs#13074
Reviewed-By: Anna Henningsen <[email protected]>
@laverdet
Copy link
Contributor Author

@MylesBorins please see #14373 for the backport :)

MylesBorins pushed a commit that referenced this pull request Jul 21, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

Backport-PR-URL: #14373
PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 21, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

Backport-PR-URL: #14373
PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Jul 21, 2017
MylesBorins pushed a commit that referenced this pull request Jul 31, 2017
This option has been broken for almost a year when used with any of the
vm.runIn.. family of functions, except for syntax errors.

Backport-PR-URL: #14373
PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 31, 2017
This fixes a race condition in the watchdog timer used for vm timeouts.
The condition would terminate the main stack's execution instead of the
code running under the sandbox.

Backport-PR-URL: #14373
PR-URL: #13074
Reviewed-By: Anna Henningsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. repl Issues and PRs related to the REPL subsystem. vm Issues and PRs related to the vm subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants