Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v18.x backport] backport recent fixes to flaky tests #44473

Closed

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Sep 1, 2022

According to the resutls of a recent CI run (https://ci.nodejs.org/job/node-test-pull-request/46350/) it seems the flakes are also affecting v18.x-staging (or I guess the flakes came from the infra being slower in spawning new processes/threads, and they show up in v18.x-staging now because the CI runs on the same infra), so backporting the test fixes to deflake the CI for v18.x-staging.

This patch stores the metadata about the Node.js binary
into the SnapshotData and adds fields denoting how the
snapshot was generated, on what platform it was
generated as well as the V8 cached data version flag.
Instead of simply crashing when the metadata doesn't
match, Node.js now prints an error message and exit with
1 for the customized snapshot, or ignore the snapshot
and start from scratch if it's the default one.

PR-URL: nodejs#44132
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
@joyeecheung joyeecheung marked this pull request as ready for review September 1, 2022 07:49
@nodejs-github-bot nodejs-github-bot added debugger Issues and PRs related to the debugger subsystem. needs-ci PRs that need a full CI run. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch. labels Sep 1, 2022
@joyeecheung
Copy link
Member Author

cc @nodejs/releasers

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

By default, the debugger would query the specified inspector
sever port to see if it's available before starting the server,
and it would keep retrying until a timeout (previously 9999 ms)
is reached. This timeout seems to be longer than necessary. This
patch decreases the timeout to 3 seconds.

PR-URL: nodejs#44359
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
sequential/test-child-process-execsync and
parallel/test-child-process-spawnsync-timeout are both flaky
on azure Windows machines, where it may take longer for Node.js
to launch and receive output from child processes. These tests
work by spawning a child processes that is supposed to sleep
for a long time, but the option is configured so that Node.js
would terminate them early when a shorter timeout is reached.
Then the tests assert that the time taken for the whole thing
is shorter than the specified sleep time (meaning the process
don't actually get to sleep for that long). To make the tests
less brittle on azure Windows, this patch raises the sleep
times in those tests on Windows platform, so that the overhead
can be taken into account there.

PR-URL: nodejs#44375
Refs: nodejs/build#3014
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Nitzan Uziely <[email protected]>
The original heap prof tests can take too long to complete on
azure Windows machines, resulting in timeouts. Split them into
smaller tests and move them into the parallel directory to
speed up the execution.

PR-URL: nodejs#44388
Refs: nodejs/reliability#356
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Previously the tests required that Node.js finish the initialization
of the watchdog thread and fires the timeout within 100ms, which
can be difficult on certain systems under load. This patch
relaxes the requirement to 2000ms. If there is a bug and the
timeout can actually be escaped, raising the timeout to 2000ms
would not make a difference anyway.

PR-URL: nodejs#44433
Refs: nodejs/reliability#333
Refs: nodejs/reliability#361
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
On Windows it might take too long for the parent to start the
communication with a child process, so by the time the parent
starts its own timer, the child process might have already
completed running, and the parent in those tests won't have a
chance to terminate these child processes after the timeout.
To address this issue, raise the time for which the child is
supposed to run to make sure that the parent starts
its own timer before the child terminates in the tests.
Also, split the test into smaller ones to reduce the overhead.

PR-URL: nodejs#44390
Refs: nodejs/reliability#356
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
The test previously created two fs.promises.open calls on the
same file with w+ back-to-back, and one of them could fail
when checking the contents of that file if the other happened
to be opening the file for write. Split them into different
tests (with different tmpdir) to avoid the race.

PR-URL: nodejs#44380
Refs: nodejs/reliability#354
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
@joyeecheung
Copy link
Member Author

Rebased to drop the commit that's already backported

@joyeecheung
Copy link
Member Author

Oh, it looks like the patches are now all in v18.x-staging. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debugger Issues and PRs related to the debugger subsystem. needs-ci PRs that need a full CI run. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants