Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky test-async-wrap-uncaughtexception #16692

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Nov 2, 2017

For some reason, the beforeExit event listener is being called
multiple times. Not entirely sure why that is, but for now, try
using a once handler to fix the flakiness in CI

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

test

For some reason, the `beforeExit` event listener is being called
multiple times. Not entirely sure why that is, but for now, try
using a once handler to fix the flakiness in CI
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Nov 2, 2017
@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

Stress test on fedora24: https://ci.nodejs.org/job/node-stress-single-test/1523/

@apapirovski
Copy link
Member

apapirovski commented Nov 2, 2017

In spirit, this seems similar to the resolution that @Trott proposed — since we're not investigating or fixing the root cause, could we add a TODO so that this isn't completely forgotten?

Copy link
Member

@apapirovski apapirovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — nice descriptive comment too 👍

@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

@Trott ... PTAL! Stress test is good.

@Trott
Copy link
Member

Trott commented Nov 2, 2017

This is the same approach I took in #16598 and I closed that because this conceals the bug and it may never get fixed if we go this route. I'd prefer we mark as flaky instead (so we at least get a yellow until a genuine fix), or even better, I'd prefer that we direct a ton of resources towards finding and fixing this bug...

I won't block this or anything, but it's not my preferred approach for the reasons described above.

@Trott
Copy link
Member

Trott commented Nov 2, 2017

With the TODO comment that @apapirovski, I'm more OK with this. :-D

@apapirovski
Copy link
Member

I think whichever route we go, I would like this comment to make it in because it clearly outlines why the test is flaky and what needs to be investigated. I think it's helpful both for existing contributors but also for anyone that's new and looking for something to work on.

@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

@Trott ... to be honest, I'm not sure it is a bug. There is nothing that says 'beforeExit' must only be called once. In fact, the docs explicitly state that beforeExit may schedule additional tasks for the event loop which would mean that it can definitely be invoked more than once simply by scheduling one additional task on the event loop. Then again, it would be great to know deterministically what is scheduling the additional work.

@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

@nodejs/collaborators ... PTAL. I'd like to get this landed early so we can unbreak CI.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2017

Then again, it would be great to know deterministically what is scheduling the additional work.

Right, and we all know that if this PR lands we won’t ever find out why that’s happening. ;)

If CI is broken because of something we don’t understand but that also isn’t a total deal breaker, that’s basically what marking tests as flaky is for, right?

process.on('beforeExit', common.mustCall(() => {
// TODO(jasnell): This is using process.once because, for some as yet unknown
// reason, the 'beforeExit' event may be emitted more than once
// under some conditions on variaous platforms. Using the once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: various

@jasnell jasnell closed this Nov 2, 2017
@lance
Copy link
Member

lance commented Nov 2, 2017

I am dipping into uncharted territory here, apologies if this is off the mark.

I agree with @jasnell that this is likely not a bug. The question is, what asynchronous task is being scheduled which may cause beforeExit to fire more than once. Looking at the test, hooks.disable() is called during the beforeExit event handler. And if you follow the code path for hooks.disable(), you wind up in async_wrap.cc DisablePromiseHook() which appears to put a task on the event queue. Wouldn't this then cause beforeExit to fire again as the event queue becomes empty a second time?

Edit: I think this PR should be reopened because process.once() seems correct for this case.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2017

@lance You might be right in that it should make beforeExit get called again (not sure about that tho), but I don’t think it does so currently. If you look at how EmitBeforeExit() is implemented in node.cc, it uses MakeCallback();, which means in particular that the microtask queue and the nextTick queue are flushed immediately after the event was emitted. So, by the time EmitBeforeExit() returns nothing should be on the event loop, and consequently beforeExit should not be emitted again…

@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

Should being the operative word here. As I read the code, there is likely a race condition created by the task queue addition in DisablePromiseHook (unverified of course because I'm unable to recreate the failure in any of my local environments)

@lance
Copy link
Member

lance commented Nov 2, 2017

@addaleax Maybe this isn't the right place to do this, but I would really like to understand better what you mean. I took a look at the code that you reference, and it's not clear to me why MakeCallback would result in the queues being flushed. The way I read it - and again... uncharted territory, so please help me out with what I'm missing - the execution path goes something like this.

  • the event loop is drained, and EmitBeforeExit is called.
  • EmitBeforeExit uses MakeCallback to call process.emit('beforeExit').
  • Ultimately, InternalMakeCallback executes the emit function. But here it does not seem to wait or otherwise be concerned about the event queue. This is where I may be missing something.
  • The emit function executes any callbacks - in this case, the test's mustCall() function which ends up calling DisablePromiseHook which adds again to the queue.
  • EmitBeforeExit returns and uv_loop_alive is checked and presumably finds the event queue to no longer be empty.

Perhaps the race condition is that if emit takes longer to return than it takes to enqueue DisablePromiseHooks? Or vice versa :). Just spitballing...

@jasnell
Copy link
Member Author

jasnell commented Nov 2, 2017

@lance ... the "magic" occurs within InternalCallbackScope ... if you look at InternalMakeCallback, you'll see that an InternalCallbackScope is created. When InternalMakeCallback exits, the Close() function is called within InternalCallbackScope, causing the microtask queue to be purged.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2017

EmitBeforeExit returns and uv_loop_alive is checked and presumably finds the event queue to no longer be empty.

@lance @jasnell I really wouldn’t think this has anything to do with this issue, the microtask queue and the libuv queue are entirely separate things. Here’s a stress test just to make sure: https://ci.nodejs.org/job/node-stress-single-test/1534/nodes=win2012r2-mp-vcbt2015/

Edit: Same config, but for master, just to get an idea of how flaky the test currently is: https://ci.nodejs.org/job/node-stress-single-test/1539/nodes=win2012r2-mp-vcbt2015/

@lance
Copy link
Member

lance commented Nov 2, 2017

@jasnell ahh I see now. Thanks for that clarification.

@addaleax I was just coming to that realization. Thanks for clearing that up.

@jasnell jasnell reopened this Nov 2, 2017
assert.deepStrictEqual(call_log, [1, 1, 1, 1]);
}));
// assert.strictEqual(typeof call_id, 'number');
// assert.deepStrictEqual(call_log, [1, 1, 1, 1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the call_log logging, but why comment these out? They won't get executed twice now because of the process.exit(1) above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for testing... not planning on keeping this commit

@jasnell jasnell closed this Nov 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants