test: set test-fs-watch as flaky #50250

anonrig · 2023-10-18T15:09:34Z

github-actions · 2023-10-18T15:09:49Z

Fast-track has been requested by @anonrig. Please 👍 to approve.

nodejs-github-bot · 2023-10-18T15:19:00Z

CI: https://ci.nodejs.org/job/node-test-pull-request/54955/

lpinca · 2023-10-18T17:48:10Z

The referenced issue was opened 2 hours ago. I think we should at least ping @nodejs/platform-s390 before marking the test flaky.

anonrig · 2023-10-18T21:38:43Z

The referenced issue was opened 2 hours ago. I think we should at least ping @nodejs/platform-s390 before marking the test flaky.

s390 team can always follow up and unflake the issue. the other way around, a.k.a. waiting for someone to respond, will only frustrate the existing pull requests and increase our CI work pressure. I recommend setting them flaky first, and later follow up when/if someone from the team is available. We should act these flaky test declarations as a todo list.

Also we should remember that there is always an open issue about the flakiness of this test.

nodejs-github-bot · 2023-10-18T21:45:13Z

CI: https://ci.nodejs.org/job/node-test-pull-request/54983/

lpinca · 2023-10-19T05:43:33Z

s390 team can always follow up and unflake the issue. the other way around, a.k.a. waiting for someone to respond, will only frustrate the existing pull requests and increase our CI work pressure. I recommend setting them flaky first, and later follow up when/if someone from the team is available. We should act these flaky test declarations as a todo list.

Also we should remember that there is always an open issue about the flakiness of this test.

I disagree, was this discussed in a TSC meeting? It is too easy to mark a test flaky and forget about it the first time it fails. No one will ever going to look into it if there isn't a constant reminder/annoyance that the test keeps failing. The list of flaky tests will only grow over time instead of reducing.

anonrig · 2023-10-19T16:28:15Z

I disagree, was this discussed in a TSC meeting? It is too easy to mark a test flaky and forget about it the first time it fails. No one will ever going to look into it if there isn't a constant reminder/annoyance that the test keeps failing. The list of flaky tests will only grow over time instead of reducing.

@lpinca It was not officially discussed. It was mostly couple of folks saying "Yeah let's do it".

cc @nodejs/tsc for visibility

Ref: nodejs#50249

lpinca · 2023-10-19T16:31:21Z

Ok, fwiw I think it is not a good idea.

nodejs-github-bot · 2023-10-19T16:38:19Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55016/

nodejs-github-bot · 2023-10-20T15:37:05Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55055/

mhdawson · 2023-10-20T15:59:15Z

I agree we should have a threshold for flakes versus a first time failure. I'm going to object to the fast track so that we have some time for the discussion.

mhdawson · 2023-10-20T16:00:38Z

I've also added to the tsc-agenda so we have a discussion/feedback on what might be an appropriate threshold.

anonrig · 2023-10-20T23:24:57Z

@mhdawson Is there any reason for not-merging this pull request? Is tsc-agenda label related to this PR specifically or a general discussion about the conversation above?

sxa · 2023-10-21T13:46:04Z

@mhdawson Is there any reason for not-merging this pull request? Is tsc-agenda label related to this PR specifically or a general discussion about the conversation above?

While it has had two reviews I think with people against this, and I'll include myself in that, it makes sense to pause the merging.

Having said that if we are struggling to get a response from the team maintaining that then we may have a wider issue (I think I may technically be on the @nodejs/platform-s390 team but I'm not actively maintaining the port there) If we can't get a response in another week or so then I'd likely be in favour, so at this stage I'm +1 on having it as part of a TSC discussion before making the decision on merging.

anonrig · 2023-10-21T14:01:56Z

If we can't get a response in another week

Although, I agree with where this is going this means that all pull requests in the next week will face this flaky test and potentially result in overwhelmed contributors, which is the main reason of fast tracking these tests.

mcollina

lgtm

sxa · 2023-10-21T17:11:59Z

If we can't get a response in another week
all pull requests in the next week will face this flaky test

So it's failing 100% of the time at the moment?

mhdawson · 2023-10-23T20:21:41Z

From the reliability reports the last time this failed was Oct 10th - nodejs/reliability#690, so it's not every run. The reliability report shows that it failed a number of times on both macos and s390 between the Oct 8th and Oct 10th.

It's not failed since. I think this is a possibly example for not being too hasty about excluding a test as it's not so flaky that it affects all builds since we've not seen failures in the last 13 days. [EDIT searching on the reliability reports I think some may not show up as I can't always find an entry in the report when we have a report in a CI].

Since it was also failing on multiple platforms, I'm not sure its a good candidate for wanting a response from a particular platform team either.

In one of the failing runs on osx (https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/), it looked like this:

[parallel.test-fs-watch-recursive-add-file](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_fs_watch_recursive_add_file/)
[parallel.test-fs-watch-recursive-add-file-to-existing-subfolder](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_fs_watch_recursive_add_file_to_existing_subfolder/)
[parallel.test-fs-watch-recursive-add-file-to-new-folder](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_fs_watch_recursive_add_file_to_new_folder/)
[parallel.test-fs-watch-recursive-add-file-with-url](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_fs_watch_recursive_add_file_with_url/)
[parallel.test-fs-watch-recursive-add-folder](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_fs_watch_recursive_add_folder/)
[parallel.test-http-server-headers-timeout-keepalive](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/parallel/test_http_server_headers_timeout_keepalive_/)
[sequential.test-watch-mode](https://ci.nodejs.org/job/node-test-commit-osx-arm/13767/nodes=osx11/testReport/junit/(root)/sequential/test_watch_mode_/)

With multiple tests failing like that I think thats less often a flaky test and more often something that needs to be cleaned up on the machine.

So unless I'm looking at the data wrong, I don't think we should disable this test at this point. It's not failed in 13 days and some of the failures looked more like machine issues.

Separately I think it would be good to agree/document something around marking tests as flaky in terms of wether it's ok to mark flaky after seeing one failure, or if not what threshold we think makes sense.

anonrig · 2023-10-25T23:36:14Z

It's not failed in 13 days and some of the failures looked more like machine issues.

@mhdawson If you resume a build and make sure the flaky test succeeds before the daily reliability check being done, you can spoof the reliability report. The following test link from the original issue is an example of this: https://ci.nodejs.org/job/node-test-commit-linuxone/40398/

I think this is the fundamental issue of reliability reports.

richardlau · 2023-10-26T12:07:28Z

@mhdawson If you resume a build and make sure the flaky test succeeds before the daily reliability check being done, you can spoof the reliability report. The following test link from the original issue is an example of this: https://ci.nodejs.org/job/node-test-commit-linuxone/40398/

I think this is the fundamental issue of reliability reports.

That's not how reliability reports work. The reports look back on the 100 most recent runs of node-test-pull-request.

nodejs-github-bot · 2023-11-08T13:23:13Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55496/

aduh95

It looks like the last time this test has been reported as failing was on nodejs/reliability#690, so one month ago, and nothing shows up since. Are we sure this is still relevant?
Adding a request for changes so it doesn't land until we have confirmation it is still flaky.

anonrig added flaky-test Issues and PRs related to the tests with unstable failures on the CI. fast-track PRs that do not need to wait for 48 hours to land. request-ci Add this label to start a Jenkins CI on a PR. labels Oct 18, 2023

nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Oct 18, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 18, 2023

panva approved these changes Oct 18, 2023

View reviewed changes

anonrig added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 18, 2023

github-actions bot mentioned this pull request Oct 19, 2023

CI Reliability 2023-10-19 nodejs/reliability#691

Open

25 tasks

test: set test-fs-watch as flaky

3ed1053

Ref: nodejs#50249

anonrig force-pushed the test-fs-watch-recursive-add-file-with-url branch from 67ed76e to 3ed1053 Compare October 19, 2023 16:31

anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 19, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 19, 2023

github-actions bot mentioned this pull request Oct 20, 2023

CI Reliability 2023-10-20 nodejs/reliability#692

Open

20 tasks

mhdawson added tsc-agenda Issues and PRs to discuss during the meetings of the TSC. and removed fast-track PRs that do not need to wait for 48 hours to land. labels Oct 20, 2023

github-actions bot mentioned this pull request Oct 21, 2023

CI Reliability 2023-10-21 nodejs/reliability#693

Open

24 tasks

himself65 approved these changes Oct 21, 2023

View reviewed changes

mcollina approved these changes Oct 21, 2023

View reviewed changes

github-actions bot mentioned this pull request Oct 22, 2023

CI Reliability 2023-10-22 nodejs/reliability#694

Open

29 tasks

mhdawson mentioned this pull request Oct 23, 2023

Node.js Technical Steering Committee (TSC) Meeting 2023-10-25 nodejs/TSC#1457

Closed

jasnell approved these changes Oct 28, 2023

View reviewed changes

mhdawson mentioned this pull request Nov 6, 2023

Node.js Technical Steering Committee (TSC) Meeting 2023-11-08 nodejs/TSC#1468

Closed

aduh95 added the request-ci Add this label to start a Jenkins CI on a PR. label Nov 8, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 8, 2023

aduh95 requested changes Nov 8, 2023

View reviewed changes

mhdawson mentioned this pull request Nov 13, 2023

Node.js Technical Steering Committee (TSC) Meeting 2023-11-15 nodejs/TSC#1471

Closed

anonrig closed this Nov 14, 2023

Uh oh!

test: set test-fs-watch as flaky #50250

test: set test-fs-watch as flaky #50250

Uh oh!

Conversation

anonrig commented Oct 18, 2023

Uh oh!

github-actions bot commented Oct 18, 2023

Uh oh!

nodejs-github-bot commented Oct 18, 2023

Uh oh!

lpinca commented Oct 18, 2023

Uh oh!

anonrig commented Oct 18, 2023

Uh oh!

nodejs-github-bot commented Oct 18, 2023

Uh oh!

lpinca commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anonrig commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lpinca commented Oct 19, 2023

Uh oh!

nodejs-github-bot commented Oct 19, 2023

Uh oh!

nodejs-github-bot commented Oct 20, 2023

Uh oh!

mhdawson commented Oct 20, 2023

Uh oh!

mhdawson commented Oct 20, 2023

Uh oh!

anonrig commented Oct 20, 2023

Uh oh!

sxa commented Oct 21, 2023

Uh oh!

anonrig commented Oct 21, 2023

Uh oh!

mcollina left a comment

Choose a reason for hiding this comment

Uh oh!

sxa commented Oct 21, 2023

Uh oh!

mhdawson commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anonrig commented Oct 25, 2023

Uh oh!

richardlau commented Oct 26, 2023

Uh oh!

nodejs-github-bot commented Nov 8, 2023

Uh oh!

aduh95 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

lpinca commented Oct 19, 2023 •

edited

Loading

anonrig commented Oct 19, 2023 •

edited

Loading

mhdawson commented Oct 23, 2023 •

edited

Loading