test: fix flaky test-async-wrap-uncaughtexception #16692

jasnell · 2017-11-02T16:23:22Z

For some reason, the beforeExit event listener is being called
multiple times. Not entirely sure why that is, but for now, try
using a once handler to fix the flakiness in CI

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

test

For some reason, the `beforeExit` event listener is being called multiple times. Not entirely sure why that is, but for now, try using a once handler to fix the flakiness in CI

jasnell · 2017-11-02T16:26:10Z

Stress test on fedora24: https://ci.nodejs.org/job/node-stress-single-test/1523/

apapirovski · 2017-11-02T16:35:45Z

In spirit, this seems similar to the resolution that @Trott proposed — since we're not investigating or fixing the root cause, could we add a TODO so that this isn't completely forgotten?

apapirovski

LGTM — nice descriptive comment too 👍

jasnell · 2017-11-02T17:01:19Z

@Trott ... PTAL! Stress test is good.

Trott · 2017-11-02T17:21:47Z

This is the same approach I took in #16598 and I closed that because this conceals the bug and it may never get fixed if we go this route. I'd prefer we mark as flaky instead (so we at least get a yellow until a genuine fix), or even better, I'd prefer that we direct a ton of resources towards finding and fixing this bug...

I won't block this or anything, but it's not my preferred approach for the reasons described above.

Trott · 2017-11-02T17:22:51Z

With the TODO comment that @apapirovski, I'm more OK with this. :-D

apapirovski · 2017-11-02T17:38:10Z

I think whichever route we go, I would like this comment to make it in because it clearly outlines why the test is flaky and what needs to be investigated. I think it's helpful both for existing contributors but also for anyone that's new and looking for something to work on.

jasnell · 2017-11-02T17:55:40Z

@Trott ... to be honest, I'm not sure it is a bug. There is nothing that says 'beforeExit' must only be called once. In fact, the docs explicitly state that beforeExit may schedule additional tasks for the event loop which would mean that it can definitely be invoked more than once simply by scheduling one additional task on the event loop. Then again, it would be great to know deterministically what is scheduling the additional work.

jasnell · 2017-11-02T18:23:19Z

@nodejs/collaborators ... PTAL. I'd like to get this landed early so we can unbreak CI.

addaleax · 2017-11-02T18:27:02Z

Then again, it would be great to know deterministically what is scheduling the additional work.

Right, and we all know that if this PR lands we won’t ever find out why that’s happening. ;)

If CI is broken because of something we don’t understand but that also isn’t a total deal breaker, that’s basically what marking tests as flaky is for, right?

thefourtheye · 2017-11-02T18:56:58Z

test/parallel/test-async-wrap-uncaughtexception.js

-process.on('beforeExit', common.mustCall(() => {
+// TODO(jasnell): This is using process.once because, for some as yet unknown
+//                reason, the 'beforeExit' event may be emitted more than once
+//                under some conditions on variaous platforms. Using the once


Nit: various

lance · 2017-11-02T19:21:24Z

I am dipping into uncharted territory here, apologies if this is off the mark.

I agree with @jasnell that this is likely not a bug. The question is, what asynchronous task is being scheduled which may cause beforeExit to fire more than once. Looking at the test, hooks.disable() is called during the beforeExit event handler. And if you follow the code path for hooks.disable(), you wind up in async_wrap.cc DisablePromiseHook() which appears to put a task on the event queue. Wouldn't this then cause beforeExit to fire again as the event queue becomes empty a second time?

Edit: I think this PR should be reopened because process.once() seems correct for this case.

addaleax · 2017-11-02T19:29:05Z

@lance You might be right in that it should make beforeExit get called again (not sure about that tho), but I don’t think it does so currently. If you look at how EmitBeforeExit() is implemented in node.cc, it uses MakeCallback();, which means in particular that the microtask queue and the nextTick queue are flushed immediately after the event was emitted. So, by the time EmitBeforeExit() returns nothing should be on the event loop, and consequently beforeExit should not be emitted again…

jasnell · 2017-11-02T20:14:48Z

Should being the operative word here. As I read the code, there is likely a race condition created by the task queue addition in DisablePromiseHook (unverified of course because I'm unable to recreate the failure in any of my local environments)

lance · 2017-11-02T20:26:25Z

@addaleax Maybe this isn't the right place to do this, but I would really like to understand better what you mean. I took a look at the code that you reference, and it's not clear to me why MakeCallback would result in the queues being flushed. The way I read it - and again... uncharted territory, so please help me out with what I'm missing - the execution path goes something like this.

the event loop is drained, and EmitBeforeExit is called.
EmitBeforeExit uses MakeCallback to call process.emit('beforeExit').
Ultimately, InternalMakeCallback executes the emit function. But here it does not seem to wait or otherwise be concerned about the event queue. This is where I may be missing something.
The emit function executes any callbacks - in this case, the test's mustCall() function which ends up calling DisablePromiseHook which adds again to the queue.
EmitBeforeExit returns and uv_loop_alive is checked and presumably finds the event queue to no longer be empty.

Perhaps the race condition is that if emit takes longer to return than it takes to enqueue DisablePromiseHooks? Or vice versa :). Just spitballing...

jasnell · 2017-11-02T20:38:43Z

@lance ... the "magic" occurs within InternalCallbackScope ... if you look at InternalMakeCallback, you'll see that an InternalCallbackScope is created. When InternalMakeCallback exits, the Close() function is called within InternalCallbackScope, causing the microtask queue to be purged.

addaleax · 2017-11-02T20:41:25Z

EmitBeforeExit returns and uv_loop_alive is checked and presumably finds the event queue to no longer be empty.

@lance @jasnell I really wouldn’t think this has anything to do with this issue, the microtask queue and the libuv queue are entirely separate things. Here’s a stress test just to make sure: https://ci.nodejs.org/job/node-stress-single-test/1534/nodes=win2012r2-mp-vcbt2015/

Edit: Same config, but for master, just to get an idea of how flaky the test currently is: https://ci.nodejs.org/job/node-stress-single-test/1539/nodes=win2012r2-mp-vcbt2015/

lance · 2017-11-02T20:42:42Z

@jasnell ahh I see now. Thanks for that clarification.

@addaleax I was just coming to that realization. Thanks for clearing that up.

lance · 2017-11-02T21:01:06Z

test/parallel/test-async-wrap-uncaughtexception.js

-  assert.deepStrictEqual(call_log, [1, 1, 1, 1]);
-}));
+//  assert.strictEqual(typeof call_id, 'number');
+//  assert.deepStrictEqual(call_log, [1, 1, 1, 1]);


I like the call_log logging, but why comment these out? They won't get executed twice now because of the process.exit(1) above.

this is just for testing... not planning on keeping this commit

test: fix flaky test-async-wrap-uncaughtexception

eb3dd4d

For some reason, the `beforeExit` event listener is being called multiple times. Not entirely sure why that is, but for now, try using a once handler to fix the flakiness in CI

nodejs-github-bot added the test Issues and PRs related to the tests. label Nov 2, 2017

[Squash] add comment

7727acf

apapirovski approved these changes Nov 2, 2017

View reviewed changes

jasnell mentioned this pull request Nov 2, 2017

Flaky test-async-wrap-uncaughtexception #16210

Closed

mscdex added the async_wrap label Nov 2, 2017

thefourtheye reviewed Nov 2, 2017

View reviewed changes

jasnell closed this Nov 2, 2017

[Squash] testing

cefeab2

jasnell reopened this Nov 2, 2017

lance reviewed Nov 2, 2017

View reviewed changes

jasnell closed this Nov 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix flaky test-async-wrap-uncaughtexception #16692

test: fix flaky test-async-wrap-uncaughtexception #16692

jasnell commented Nov 2, 2017

jasnell commented Nov 2, 2017

apapirovski commented Nov 2, 2017 •

edited

Loading

apapirovski left a comment

jasnell commented Nov 2, 2017

Trott commented Nov 2, 2017

Trott commented Nov 2, 2017

apapirovski commented Nov 2, 2017

jasnell commented Nov 2, 2017

jasnell commented Nov 2, 2017

addaleax commented Nov 2, 2017

thefourtheye Nov 2, 2017

lance commented Nov 2, 2017 •

edited

Loading

addaleax commented Nov 2, 2017

jasnell commented Nov 2, 2017 •

edited

Loading

lance commented Nov 2, 2017 •

edited

Loading

jasnell commented Nov 2, 2017

addaleax commented Nov 2, 2017 •

edited

Loading

lance commented Nov 2, 2017

lance Nov 2, 2017

jasnell Nov 2, 2017

test: fix flaky test-async-wrap-uncaughtexception #16692

test: fix flaky test-async-wrap-uncaughtexception #16692

Conversation

jasnell commented Nov 2, 2017

Checklist

Affected core subsystem(s)

jasnell commented Nov 2, 2017

apapirovski commented Nov 2, 2017 • edited Loading

apapirovski left a comment

Choose a reason for hiding this comment

jasnell commented Nov 2, 2017

Trott commented Nov 2, 2017

Trott commented Nov 2, 2017

apapirovski commented Nov 2, 2017

jasnell commented Nov 2, 2017

jasnell commented Nov 2, 2017

addaleax commented Nov 2, 2017

thefourtheye Nov 2, 2017

Choose a reason for hiding this comment

lance commented Nov 2, 2017 • edited Loading

addaleax commented Nov 2, 2017

jasnell commented Nov 2, 2017 • edited Loading

lance commented Nov 2, 2017 • edited Loading

jasnell commented Nov 2, 2017

addaleax commented Nov 2, 2017 • edited Loading

lance commented Nov 2, 2017

lance Nov 2, 2017

Choose a reason for hiding this comment

jasnell Nov 2, 2017

Choose a reason for hiding this comment

apapirovski commented Nov 2, 2017 •

edited

Loading

lance commented Nov 2, 2017 •

edited

Loading

jasnell commented Nov 2, 2017 •

edited

Loading

lance commented Nov 2, 2017 •

edited

Loading

addaleax commented Nov 2, 2017 •

edited

Loading