-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky sequential/test-async-wrap-getasyncid #18190
Comments
Also here (Alpine) and here (Windows 10). |
Closing as duplicate of #13020 |
@BridgeAR This issue appears after the fix of #13020 landed, see #13020 (comment) and #13020 (comment) |
@BridgeAR I'll close the original issue. Have not seen a CRASHED in this test in these two months so we probably don't need to open a separate issue for #13020 (comment) |
This is similar to #16210, both |
@joyeecheung I have a strong feeling this might be related to eca73a2 or 791975d — this only started failing after those commits landed and at a quick glance, the test does use |
@apapirovski The last CI run of eca73a2 was https://ci.nodejs.org/job/node-test-pull-request/12540/, which did not contain this flake. The CI run before 791975d landed was https://ci.nodejs.org/job/node-test-pull-request/12562/ (for another PR ) and there was already this flake...still investigating before those CI runs expired |
I think the flake first appeared in https://ci.nodejs.org/job/node-test-pull-request/12562/ for #18186 (has not yet landed), and it was rebasing onto 316b5ef |
Wait...I think I've seen one of the tests in 316b5ef failed on my mac this week (by looking at my |
Ok, I know this is pushing it... but is there any way that https://ci.nodejs.org/job/node-test-pull-request/12562/ was wrong about what it was rebased on? It's the only outlier that doesn't point to the problem being with #17914 — the timestamps even align. The other commit literally has nothing to do with the code being tested... :( |
The git-rebase log of https://ci.nodejs.org/job/node-test-pull-request/12562/
Also there wasn't this flake in the CI run of #18037 |
Ok... a shot in the dark: what if we have two different bugs here. Because the main issue occurring now is on alpine but that one issue above is on Windows. So perhaps #17914 introduced a more frequent failure on alpine but we also have some other failure that happens very infrequently. |
There was also #13020 (comment) which has the same error stack trace and it's on arm, although it has not appeared for a long time so I am not sure if that's related or not |
I suppose we could also consider #18144 as the culprit, even though its CI didn't have any issues. *shrug* |
No idea how to fire sequential stress tests on sha's...opened #18238 cause it's easier to run CIs |
It showed up again...this was rebased onto 8803b69 https://ci.nodejs.org/job/node-test-commit-linux/15734/nodes=ubuntu1710-x64/console
|
FWIW I've done some preliminary debugging, the loop is being kept alive because there's a uv_close being called on some handle within the main event loop. Should be a good starting point if anyone wants to dig in. |
Edit: Ignore me for now... I can't even tell if this is actually cares_wrap or not... |
Since the `DestroyAsyncIdsCallback` in Node.js can be scheduled as a result of GC, that means that it can accidentally keep the event loop open when it shouldn't. Replace `SetImmediate` with the newly introduced `SetUnrefImmediate` and in addition introduce RunBeforeExit callbacks, of which `DestroyAsyncIdsCallback` is now the first. These callbacks will run before the `beforeExit` event is emitted (which will now only be emitted if the event loop is still not active). PR-URL: nodejs#18241 Fixes: nodejs#18190 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Andreas Madsen <[email protected]>
Since the `DestroyAsyncIdsCallback` in Node.js can be scheduled as a result of GC, that means that it can accidentally keep the event loop open when it shouldn't. Replace `SetImmediate` with the newly introduced `SetUnrefImmediate` and in addition introduce RunBeforeExit callbacks, of which `DestroyAsyncIdsCallback` is now the first. These callbacks will run before the `beforeExit` event is emitted (which will now only be emitted if the event loop is still not active). PR-URL: nodejs#18241 Fixes: nodejs#18190 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Andreas Madsen <[email protected]>
If anyone wants to help out, please have a look at the now closed #18307 and suggest better ways of accomplishing this. Or if you believe the original PR was good enough, let me know. |
Instead of relying on garbage collection to close the handle, manage its state more explicitly. PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Instead of relying on garbage collection to close the timer handle, manage its state more explicitly. PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Instead of relying on garbage collection to close the handle, manage its state more explicitly. PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Instead of relying on garbage collection to close the timer handle, manage its state more explicitly. PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Instead of relying on garbage collection to close the handle, manage its state more explicitly. PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
PR-URL: #21093 Fixes: #18190 Refs: #18307 Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Opening a separate issue for the flake reported in #13020 (comment) , I have not seem it for a while but tonight it has come back in different unrelated PRs.
Sample:
https://ci.nodejs.org/job/node-test-commit-linux/15590/nodes=alpine37-container-x64/console
https://ci.nodejs.org/job/node-test-binary-windows/14242/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=3/console
The text was updated successfully, but these errors were encountered: