Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-watch-mode (no. 3) #44898

Open
tniessen opened this issue Oct 5, 2022 · 18 comments · Fixed by #48147
Open

Investigate flaky test-watch-mode (no. 3) #44898

tniessen opened this issue Oct 5, 2022 · 18 comments · Fixed by #48147
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. freebsd Issues and PRs related to the FreeBSD platform. linux Issues and PRs related to the Linux platform. watch-mode Issues and PRs related to watch mode

Comments

@tniessen
Copy link
Member

tniessen commented Oct 5, 2022

Test

test-watch-mode

Platform

Linux x64, FreeBSD

Console output

13:17:49 not ok 3811 sequential/test-watch-mode
13:17:55   ---
13:17:55   duration_ms: 5.291
13:17:55   severity: fail
13:17:55   exitcode: 1
13:17:55   stack: |-
13:17:55     TAP version 13
13:17:55     # Subtest: watch mode
13:17:55         # Subtest: should watch changes to a file - event loop ended
13:17:55         ok 1 - should watch changes to a file - event loop ended
13:17:55           ---
13:17:55           duration_ms: 2739.793821
13:17:55           ...
13:17:55         # Subtest: should watch changes to a failing file
13:17:55         ok 2 - should watch changes to a failing file
13:17:55           ---
13:17:55           duration_ms: 2477.028169
13:17:55           ...
13:17:55         # Subtest: should not watch when running an non-existing file
13:17:55         ok 3 - should not watch when running an non-existing file
13:17:55           ---
13:17:55           duration_ms: 2285.642733
13:17:55           ...
13:17:55         # Subtest: should watch when running an non-existing file - when specified under --watch-path
13:17:55         ok 4 - should watch when running an non-existing file - when specified under --watch-path # SKIP
13:17:55           ---
13:17:55           duration_ms: 209.591623
13:17:55           ...
13:17:55         # Subtest: should watch changes to a file - event loop blocked
13:17:55         not ok 5 - should watch changes to a file - event loop blocked
13:17:55           ---
13:17:55           duration_ms: 2870.03467
13:17:55           failureType: 'testCodeFailure'
13:17:55           error: |-
13:17:55             Expected values to be strictly deep-equal:
13:17:55             + actual - expected
13:17:55             
13:17:55               [
13:17:55             +   'running',
13:17:55             +   "Restarting '/home/iojs/build/workspace/node-test-commit-linux-containered/test/fixtures/watch-mode/event_loop_blocked.js'"
13:17:55             -   "Restarting '/home/iojs/build/workspace/node-test-commit-linux-containered/test/fixtures/watch-mode/event_loop_blocked.js'",
13:17:55             -   'running'
13:17:55               ]
13:17:55           code: 'ERR_ASSERTION'
13:17:55           stack: |-
13:17:55             assertRestartedCorrectly (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/sequential/test-watch-mode.mjs:74:10)
13:17:55             Object.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/sequential/test-watch-mode.mjs:141:5)
13:17:55             process.processTicksAndRejections (node:internal/process/task_queues:95:5)
13:17:55             async ItTest.run (node:internal/test_runner/test:512:9)
13:17:55             async Promise.all (index 4)
13:17:55             async Suite.run (node:internal/test_runner/test:739:7)
13:17:55           ...
13:17:55         # Subtest: should watch changes to dependencies - cjs
13:17:55         ok 6 - should watch changes to dependencies - cjs
13:17:55           ---
13:17:55           duration_ms: 2782.278082
13:17:55           ...
13:17:55         # Subtest: should watch changes to dependencies - esm
13:17:55         ok 7 - should watch changes to dependencies - esm
13:17:55           ---
13:17:55           duration_ms: 2598.258289
13:17:55           ...
13:17:55         # Subtest: should restart multiple times
13:17:55         ok 8 - should restart multiple times
13:17:55           ---
13:17:55           duration_ms: 4515.95009
13:17:55           ...
13:17:55         # Subtest: should pass arguments to file
13:17:55         ok 9 - should pass arguments to file
13:17:55           ---
13:17:55           duration_ms: 2784.227409
13:17:55           ...
13:17:55         # Subtest: should not load --require modules in main process
13:17:55         ok 10 - should not load --require modules in main process
13:17:55           ---
13:17:55           duration_ms: 2822.566444
13:17:55           ...
13:17:55         # Subtest: should not load --import modules in main process
13:17:55         ok 11 - should not load --import modules in main process
13:17:55           ---
13:17:55           duration_ms: 2968.502896
13:17:55           ...
13:17:55         1..11
13:17:55     not ok 1 - watch mode
13:17:55       ---
13:17:55       duration_ms: 4638.010352
13:17:55       failureType: 'subtestsFailed'
13:17:55       error: '1 subtest failed'
13:17:55       code: 'ERR_TEST_FAILURE'
13:17:55       ...
13:17:55     1..1
13:17:55     # tests 1
13:17:55     # pass 0
13:17:55     # fail 1
13:17:55     # cancelled 0
13:17:55     # skipped 0
13:17:55     # todo 0
13:17:55     # duration_ms 4649.416376
13:17:55   ...

Build links

Additional information

Refs: #44655
Refs: #44735
Refs: #44805

@tniessen tniessen added flaky-test Issues and PRs related to the tests with unstable failures on the CI. linux Issues and PRs related to the Linux platform. labels Oct 5, 2022
@tniessen
Copy link
Member Author

tniessen commented Oct 7, 2022

Similar output:

16:12:03 not ok 3793 sequential/test-watch-mode
16:12:06   ---
16:12:06   duration_ms: 3.677
16:12:06   severity: fail
16:12:06   exitcode: 1
16:12:06   stack: |-
16:12:06     TAP version 13
16:12:06     # Subtest: watch mode
16:12:06         # Subtest: should watch changes to a file - event loop ended
16:12:06         ok 1 - should watch changes to a file - event loop ended
16:12:06           ---
16:12:06           duration_ms: 1657.422205
16:12:06           ...
16:12:06         # Subtest: should watch changes to a failing file
16:12:06         ok 2 - should watch changes to a failing file
16:12:06           ---
16:12:06           duration_ms: 2144.675239
16:12:06           ...
16:12:06         # Subtest: should not watch when running an non-existing file
16:12:06         ok 3 - should not watch when running an non-existing file
16:12:06           ---
16:12:06           duration_ms: 848.075766
16:12:06           ...
16:12:06         # Subtest: should watch when running an non-existing file - when specified under --watch-path
16:12:06         ok 4 - should watch when running an non-existing file - when specified under --watch-path # SKIP
16:12:06           ---
16:12:06           duration_ms: 281.828198
16:12:06           ...
16:12:06         # Subtest: should watch changes to a file - event loop blocked
16:12:06         ok 5 - should watch changes to a file - event loop blocked
16:12:06           ---
16:12:06           duration_ms: 1094.820609
16:12:06           ...
16:12:06         # Subtest: should watch changes to dependencies - cjs
16:12:06         ok 6 - should watch changes to dependencies - cjs
16:12:06           ---
16:12:06           duration_ms: 2049.295256
16:12:06           ...
16:12:06         # Subtest: should watch changes to dependencies - esm
16:12:06         ok 7 - should watch changes to dependencies - esm
16:12:06           ---
16:12:06           duration_ms: 2216.116407
16:12:06           ...
16:12:06         # Subtest: should restart multiple times
16:12:06         ok 8 - should restart multiple times
16:12:06           ---
16:12:06           duration_ms: 3098.411375
16:12:06           ...
16:12:06         # Subtest: should pass arguments to file
16:12:06         ok 9 - should pass arguments to file
16:12:06           ---
16:12:06           duration_ms: 1908.309201
16:12:06           ...
16:12:06         # Subtest: should not load --require modules in main process
16:12:06         ok 10 - should not load --require modules in main process
16:12:06           ---
16:12:06           duration_ms: 1360.515837
16:12:06           ...
16:12:06         # Subtest: should not load --import modules in main process
16:12:06         not ok 11 - should not load --import modules in main process
16:12:06           ---
16:12:06           duration_ms: 2044.114142
16:12:06           failureType: 'testCodeFailure'
16:12:06           error: |-
16:12:06             Expected values to be strictly deep-equal:
16:12:06             + actual - expected
16:12:06             
16:12:06               [
16:12:06             +   "Completed running '/usr/home/iojs/node-tmp/.tmp.3792/3.js'",
16:12:06             +   "Restarting '/usr/home/iojs/node-tmp/.tmp.3792/3.js'"
16:12:06             -   "Restarting '/usr/home/iojs/node-tmp/.tmp.3792/3.js'",
16:12:06             -   "Completed running '/usr/home/iojs/node-tmp/.tmp.3792/3.js'"
16:12:06               ]
16:12:06           code: 'ERR_ASSERTION'
16:12:06           stack: |-
16:12:06             assertRestartedCorrectly (file:///usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd12-x64/test/sequential/test-watch-mode.mjs:74:10)
16:12:06             Object.<anonymous> (file:///usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd12-x64/test/sequential/test-watch-mode.mjs:218:5)
16:12:06             process.processTicksAndRejections (node:internal/process/task_queues:95:5)
16:12:06             async ItTest.run (node:internal/test_runner/test:512:9)
16:12:06             async Promise.all (index 10)
16:12:06             async Suite.run (node:internal/test_runner/test:739:7)
16:12:06           ...
16:12:06         1..11
16:12:06     not ok 1 - watch mode
16:12:06       ---
16:12:06       duration_ms: 3336.446593
16:12:06       failureType: 'subtestsFailed'
16:12:06       error: '1 subtest failed'
16:12:06       code: 'ERR_TEST_FAILURE'
16:12:06       ...
16:12:06     1..1
16:12:06     # tests 1
16:12:06     # pass 0
16:12:06     # fail 1
16:12:06     # cancelled 0
16:12:06     # skipped 0
16:12:06     # todo 0
16:12:06     # duration_ms 3344.389888
16:12:06   ...

@MoLow Looks like the order within some arrays is non-deterministic. Not sure if that's a bug in the implementation or in the test.

@tniessen tniessen added the freebsd Issues and PRs related to the FreeBSD platform. label Oct 7, 2022
@MoLow
Copy link
Member

MoLow commented Oct 10, 2022

On vacation, will have to wait unless someone else wants to give it a look

@pbo-linaro
Copy link
Contributor

This issue was observed on Windows on Arm platform too.

@tniessen @MoLow @richardlau Would you accept to mark those tests as flaky, waiting for a proper fix?

@kapouer
Copy link
Contributor

kapouer commented Oct 15, 2022

That test is failing in a variety of ways on our debian builds too (https://buildd.debian.org/status/package.php?p=nodejs&suite=experimental)

@pbo-linaro
Copy link
Contributor

Created a PR to mark those test-watch-mode as flaky, so it can unlock the current situation.

@kapouer
Copy link
Contributor

kapouer commented Oct 18, 2022

IMO they're more than flaky, several times they just hang for a long time.

@pbo-linaro
Copy link
Contributor

Is there a timeout of stuff like this we can set for nodejs tests? Or deletion is the only alternative?

@kapouer
Copy link
Contributor

kapouer commented Oct 18, 2022

In theory there is already a timeout for tests, but in the case of that test it didn't work (as usual, take what I say with precautions because it's with the debian build of nodejs) and the whole test suite just hang.

@pbo-linaro
Copy link
Contributor

@richardlau @MoLow Should we delete those tests for now, as they seem to create some difficult problems? I know it's not ideal, but hanging a testsuite and report random failures is really a huge difficulty.

@MoLow
Copy link
Member

MoLow commented Oct 18, 2022

the timeouts happen in the test-watch-mode-inspect tests, and flakyness in test-watch-mode,
I am planning to take a deeper look but I am with a temporary computer for a while - I think for the time it is ok to mark as flaky without deleting

@MoLow MoLow added the watch-mode Issues and PRs related to watch mode label Oct 18, 2022
@tniessen
Copy link
Member Author

Another occurrence on macOS:

=== release test-watch-mode-files_watcher ===
Path: parallel/test-watch-mode-files_watcher
Error: --- stdout ---
TAP version 13
# Subtest: watch mode file watcher
    # Subtest: should watch changed files
    ok 1 - should watch changed files
      ---
      duration_ms: 20.78446
      ...
    # Subtest: should throttle changes
    ok 2 - should throttle changes
      ---
      duration_ms: 230.579784
      ...
    # Subtest: should ignore files in watched directory if they are not filtered
    ok 3 - should ignore files in watched directory if they are not filtered
      ---
      duration_ms: 1004.011198
      ...
    # Subtest: should allow clearing filters
    ok 4 - should allow clearing filters
      ---
      duration_ms: 1219.288977
      ...
Command: out/Release/node --expose-internals /Users/runner/work/node/node/test/parallel/test-watch-mode-files_watcher.mjs
--- TIMEOUT ---

===
=== 1 tests failed
===

StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Nov 22, 2022
StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Nov 22, 2022
StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Nov 22, 2022
@StefanStojanovic
Copy link
Contributor

@tniessen @MoLow I've opened 2 PRs fixing both test-watch-mode-inspect and test-watch-mode. As it turned out the issue was not syncing process execution and its restart. However, even after fixing that I got rare repros of logs between starting node in watch mode and the first triggered restart missing (more details on that in the PR descriptions). Did you encounter something similar before and do you have some ideas on why that could happen?

StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Dec 1, 2022
StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Dec 1, 2022
nodejs-github-bot pushed a commit that referenced this issue Dec 1, 2022
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
nodejs-github-bot pushed a commit that referenced this issue Dec 1, 2022
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
targos pushed a commit that referenced this issue Dec 12, 2022
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
targos pushed a commit that referenced this issue Dec 12, 2022
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@jasnell
Copy link
Member

jasnell commented Dec 20, 2022

I'm assuming this is fixed since #45585 landed?

Please re-open if that proves not to be the case.

danielleadams pushed a commit that referenced this issue Dec 30, 2022
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Dec 30, 2022
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Dec 30, 2022
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Dec 30, 2022
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 3, 2023
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 3, 2023
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 4, 2023
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 4, 2023
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 5, 2023
Refs: #44898
PR-URL: #45586
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
danielleadams pushed a commit that referenced this issue Jan 5, 2023
Refs: #44898
PR-URL: #45585
Reviewed-By: Moshe Atlow <[email protected]>
Reviewed-By: Erick Wendel <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@anonrig
Copy link
Member

anonrig commented Apr 13, 2023

This test is inside the sequential.status file and marked as PASS, FLAKY but today one of the CI builds failed because of this. Do we have a bug?

Build: https://ci.nodejs.org/job/node-test-binary-armv7l/5443/

@anonrig anonrig reopened this Apr 13, 2023
@targos
Copy link
Member

targos commented Apr 13, 2023

I'm not sure why this CI run failed, but it's not because of the test timeout.

@richardlau
Copy link
Member

It failed because of leftover Node.js processes at the end of the test run:

16:59:49 All tests passed.
16:59:49 ps awwx | grep Release/node | grep -v grep | cat
16:59:49  643411 ?        Sl     0:00 /home/iojs/build/workspace/node-test-binary-armv7l/out/Release/node --inspect=0 --watch /home/iojs/build/workspace/node-test-binary-armv7l/test/fixtures/watch-mode/inspect.js
16:59:49  643417 ?        Sl     0:00 /home/iojs/build/workspace/node-test-binary-armv7l/out/Release/node --inspect=0 /home/iojs/build/workspace/node-test-binary-armv7l/test/fixtures/watch-mode/inspect.js
16:59:49 make: *** [Makefile:538: test-ci-js] Error 1

ps awwx | grep Release/node | grep -v grep | cat is expected to not return any processes.

node/Makefile

Lines 538 to 543 in 5fa84e8

$(info Clean up any leftover processes, error if found.)
ps awwx | grep Release/node | grep -v grep | cat
@PS_OUT=`ps awwx | grep Release/node | grep -v grep | awk '{print $$1}'`; \
if [ "$${PS_OUT}" ]; then \
echo $${PS_OUT} | xargs kill -9; exit 1; \
fi

@tniessen
Copy link
Member Author

Another failure of test-watch-mode-files_watcher:

01:20:12 not ok 3455 parallel/test-watch-mode-files_watcher
01:20:12   ---
01:20:12   duration_ms: 120068.19300
01:20:12   severity: fail
01:20:12   exitcode: -15
01:20:12   stack: |-
01:20:12     timeout
01:20:12     TAP version 13
01:20:12     # Subtest: watch mode file watcher
01:20:12         # Subtest: should watch changed files
01:20:12         ok 1 - should watch changed files
01:20:12           ---
01:20:12           duration_ms: 179.681248
01:20:12           ...
01:20:12   ...

nodejs-github-bot pushed a commit that referenced this issue Jun 8, 2023
PR-URL: #48147
Fixes: #44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
RafaelGSS pushed a commit that referenced this issue Jul 3, 2023
PR-URL: #48147
Fixes: #44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48147
Fixes: nodejs#44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48147
Fixes: nodejs#44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
MoLow added a commit to MoLow/node that referenced this issue Sep 3, 2023
PR-URL: nodejs#48147
Fixes: nodejs#44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
MoLow added a commit to MoLow/node that referenced this issue Sep 3, 2023
PR-URL: nodejs#48147
Fixes: nodejs#44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
ruyadorno pushed a commit to MoLow/node that referenced this issue Sep 7, 2023
PR-URL: nodejs#48147
Fixes: nodejs#44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
ruyadorno pushed a commit that referenced this issue Sep 7, 2023
PR-URL: #48147
Backport-PR-URL: #49468
Fixes: #44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
ruyadorno pushed a commit that referenced this issue Sep 13, 2023
PR-URL: #48147
Backport-PR-URL: #49468
Fixes: #44898
Reviewed-By: Benjamin Gruenbaum <[email protected]>
@juanarbol
Copy link
Member

This happened again in https://ci.nodejs.org/job/node-test-commit-smartos/55674/nodes=smartos20-64/

Re-opening this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. freebsd Issues and PRs related to the FreeBSD platform. linux Issues and PRs related to the Linux platform. watch-mode Issues and PRs related to watch mode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants