-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm: fix race condition with timeout
param
#13074
Conversation
I just came here looking for someone talking about the problem I'm experiencing. See the code below - timeout exceptions only get caught intermittently. Is this a symptom of the problem you fixed, or is this a different issue?
|
@matt-kruse yes your issue is fixed by this patch :) edit: wait no that may be a different issue edit2: I have an idea of what is causing your problem. The else if (isBuf || validChunk(this, state, chunk, cb)) {
state.pendingcb++;
ret = writeOrBuffer(this, state, isBuf, chunk, encoding, cb);
} What happens if execution is terminated right after This modification to your test demonstrates that the problem is in the writable stream library: const vm = require('vm');
const timeout = new vm.Script(`while(1){console.log('x');}`);
try {
timeout.runInNewContext( {"console":console} , {timeout:10} );
}
catch(e) {
// Now this will always show the caught error
console.error(e);
}
console.log('done'); // This sometimes will not log Note the change of |
@laverdet is that because |
Fantastic insight, thank you. Luckily my actual use case doesn't depend on console or stdout - this was a manufactured test case to play with vm timeouts, and I happened to use console.log(). I'd be curious to know what other landmines might exist in Node when terminating execution abruptly, but I assume as long as my vm code isn't hooking into shared streams or objects, I should be pretty safe. |
@rodoabad yeah sorry I should have made that clear. |
We use this patch in production in this project. I can confirm that applying it removes a lot of random freezes on |
@@ -231,7 +231,7 @@ function REPLServer(prompt, | |||
try { | |||
try { | |||
const scriptOptions = { | |||
displayErrors: true, | |||
displayErrors: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you could split the displayErrors
changes out into a separate commit?
test/parallel/test-vm-timeout.js
Outdated
// or so. A better, but still incorrect implementation would fail after | ||
// 100 seconds or so. If you're messing with vm timeouts you might | ||
// consider increasing this timeout to hammer out races. | ||
const giveUp = Date.now() + 2500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests that are run in our CI shouldn’t generally take more than 1 second to finish… can you split this out into its own test file and reduce the timeout here?
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors.
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox.
Split displayErrors fix out, and added a test for it. I also added a new line to the decorated stack because that matches the behavior of src/node.cc:1738 which is the other place stacks get a fancy arrow added. Put the vm race test under the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM, thanks for the PR!
Split displayErrors fix out, and added a test for it.
Thank you!
Put the vm race test under the
pummel
directory because that seems like the right place?
It is, yes… Unfortunately, we don’t run those tests in CI, but I guess here we just have to live with it.
Local<String> decorated_stack = String::Concat( | ||
String::Concat(arrow.As<String>(), | ||
FIXED_ONE_BYTE_STRING(env->isolate(), "\n")), | ||
stack.As<String>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a tiny style nit, but we prefer 4 spaces of indentation for statement continuations – that can be fixed while merging, though
@@ -956,27 +958,20 @@ class ContextifyScript : public BaseObject { | |||
bool timed_out = false; | |||
bool received_signal = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in a previous version of this PR these were volatile
… any reason why that changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I meant to leave a note about that. The volatile ended up being unnecessary. The watchdogs take a non-const reference so that's good enough to ensure the optimizer doesn't do anything fishy. Then in the dtor uv_thread_join
is called which will synchronize the watchdog thread back up with the main thread.
explicit Watchdog( | ||
v8::Isolate* isolate, | ||
uint64_t ms, | ||
bool* timed_out = NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for 4-space indents. I think here we’d usually keep the v8::Isolate* isolate
bit directly after the (
and align all other parameters vertically below it; but again, this is just a tiny nit and can be fixed up while landing.
explicit SigintWatchdog(v8::Isolate* isolate); | ||
explicit SigintWatchdog( | ||
v8::Isolate* isolate, | ||
bool* received_signal = NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ditto)
// NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | ||
// DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR | ||
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE | ||
// USE OR OTHER DEALINGS IN THE SOFTWARE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the copyright header for newly added files, I think.
With regard to the style changes is there a style guide out there for newcomers? The jslinter included was pretty helpful but in C++ land I found a lot of clashing styles so I just winged it. Though now that you point it out the 4-space continuation thing seems obvious. |
Nothing except the Google styleguide which we roughly follow, but tbh I wouldn’t expect anybody to read the full text for making a contribution (e.g. I haven’t done that myself). We could probably tighten up the linter, but it doesn’t usually take people long to get used to the C++ style quirks that we have; I think the 4-space indents for continuations,
It took me a while to get used to it as well ;) |
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
* test_constructor: Napi test was failing because of over specifiying a not condition. Fixed that and also restructured the code to not iterate over list of properties twice. * test-vm-timeout-rethrow: This started failing after nodejs/node#13074 changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked this as flaky for now. * test-bindings: Marked this as flaky for now.
* test_constructor: Napi test was failing because of over specifiying a not condition. Fixed that and also restructured the code to not iterate over list of properties twice. * test-vm-timeout-rethrow: This started failing after nodejs/node#13074 changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked this as flaky for now. * test-bindings: Marked this as flaky for now.
* test_constructor: Napi test was failing because of over specifiying a not condition. Fixed that and also restructured the code to not iterate over list of properties twice. * test-vm-timeout-rethrow: This started failing after nodejs/node#13074 changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked this as flaky for now. * test-bindings: Marked this as flaky for now.
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked this as flaky for now. * test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked below test as flaky. * parallel/test-vm-sigint : Marked this as flaky for now. * parallel/test-vm-sigint-existing-handler : Marked this as flaky for now. * parallel/test-repl-sigint : Marked this as flaky for now. * parallel/test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked below test as flaky. * parallel/test-vm-sigint : Marked this as flaky for now. * parallel/test-vm-sigint-existing-handler : Marked this as flaky for now. * parallel/test-repl-sigint : Marked this as flaky for now. * parallel/test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked below test as flaky. * parallel/test-vm-sigint : Marked this as flaky for now. * parallel/test-vm-sigint-existing-handler : Marked this as flaky for now. * parallel/test-repl-sigint : Marked this as flaky for now. * parallel/test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked below test as flaky. * parallel/test-vm-sigint : Marked this as flaky for now. * parallel/test-vm-sigint-existing-handler : Marked this as flaky for now. * parallel/test-repl-sigint : Marked this as flaky for now. * parallel/test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
* test-vm-timeout-rethrow: This started failing after nodejs/node#13074/ changes upstream. The change is to not do `GetAndClearException` but allocate a string (`JsCreateString`) which throws because engine is in exception state. We need to figure out better way to handle these situation. Marked below test as flaky. * parallel/test-vm-sigint : Marked this as flaky for now. * parallel/test-vm-sigint-existing-handler : Marked this as flaky for now. * parallel/test-repl-sigint : Marked this as flaky for now. * parallel/test-bindings: Marked this as flaky for now. PR-URL: nodejs#264 Reviewed-By: Kyle Farnung <[email protected]> Reviewed-By: Jason Ginchereau <[email protected]>
Should this land on v6.x? |
@MylesBorins the patches apply cleanly to the v6.11.1 tag for me, except for the output of a test which is a simple fix. Did you run into other issues? |
@laverdet that seems to be the conflict. Unfortunately the fix is not so simple for me to sort out... could you provide a Backport PR please |
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. PR-URL: nodejs#13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. PR-URL: nodejs#13074 Reviewed-By: Anna Henningsen <[email protected]>
@MylesBorins please see #14373 for the backport :) |
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. Backport-PR-URL: #14373 PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. Backport-PR-URL: #14373 PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This option has been broken for almost a year when used with any of the vm.runIn.. family of functions, except for syntax errors. Backport-PR-URL: #14373 PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox. Backport-PR-URL: #14373 PR-URL: #13074 Reviewed-By: Anna Henningsen <[email protected]>
This fixes a race condition in the watchdog timer used for vm timeouts. The condition would terminate the main stack's execution instead of the code running under the sandbox.
This also fixes the
displayErrors
parameter in the runIn... family of functions.Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
Notes
The code dealing with
displayErrors
was dead. When this module was first written it looked like this:But 55b87c0 modified the first if in such a way that the second if would never be true. So I went ahead and fixed that while I was in there, but that caused a bunch of errors in the repl tests because the output changed. I disabled
displayErrors
in the repl because the output isn't too useful in that context anyway.