Ruthlessly mark tests that fail frequently as flaky #43955

mcollina · 2022-07-23T12:17:56Z

The status of our CI is deteriorating to the point of being impossible to land anything.

Here is my proposal: declare bankruptcy and move all tests detected in https://github.com/nodejs/reliability as flaky.

mcollina · 2022-07-23T12:18:14Z

cc @nodejs/tsc

tniessen · 2022-07-23T12:37:14Z

Context: #43754 (comment) and #43929 (comment)

lpinca · 2022-07-23T17:36:03Z

To be honest I don't think this is a good idea. More time should be spent on fixing flaky tests before adding new features. If this is not done the number of flaky tests will only grow over time defeating the whole point of testing.

devsnek · 2022-07-23T17:41:04Z

Do we have any context on whether the CI run was intended to pass? Sometimes I will be running on battery, so instead of building/testing/linting on my laptop I will let the CI do it, which generally comes up red.

@lpinca I agree with you in principle but that hinges on getting actual humans to spend actual time fixing the tests. Do we have the assurance that this can be done?

lpinca · 2022-07-23T18:02:49Z

Do we have the assurance that this can be done?

I don't know but we should at least try. I only remember a handful of contributors who took the time to do this. I also think that we should help new contributors not to create flaky tests. For example, discouraging the use of timers unless strictly needed.

tniessen · 2022-07-23T20:14:08Z

More time should be spent on fixing flaky tests before adding new features. If this is not done the number of flaky tests will only grow over time defeating the whole point of testing.

@lpinca I agree with you. However, as you said these tests are flaky, so marking them as flaky seems like the right approach. If it is not then we should use different terms for tests that are flaky and tests that should be marked as flaky.

lpinca · 2022-07-24T05:28:54Z

However, as you said these tests are flaky

Sometimes flakiness is caused by real bugs. If we simply mark tests flaky with no investigation, we might hide real issues and make it harder to discover and fix bugs. Here are two recent examples:

and the likely underlying bug

http: fix http server connection list when close #43949

tniessen · 2022-07-24T10:55:34Z

Sometimes flakiness is caused by real bugs.

@lpinca I am not disagreeing, I'd even go as far as to say that almost all flaky tests are caused by bugs -- either in node, in the test itself, or in the test environment.

If we simply mark tests flaky with no investigation, we might hide real issues and make it harder to discover and fix bugs.

That's not the goal; we should treat the list of flaky tests as an urgent TODO list. My point is: flaky tests should not make contributing a worse experience for everyone.

mcollina · 2022-07-24T14:48:54Z

Note that close to 50% of our failures are of the obscure nature, either Jenkins or build-related issues.

bnoordhuis · 2022-07-24T20:33:07Z

Looking at the latest reliability report, nearly half are infrastructural failures; marking things flaky won't help there.

Some of the real flakes, like parallel/test-heapsnapshot-near-heap-limit-worker, are probably impossible to make reliable except in a statistical sense ("on average didn't fail more than 1 out of 10 times during the last 100 runs.")

Tests like that we could either disable/remove/mark flaky, or build out the CI to track swings in statistical flakiness. That's what e.g. Mozilla does for their test suites.

mhdawson · 2022-07-26T18:02:29Z

I'd agree that for tests we cannot make reliable we should remove them.

In the bigger picture I don't think we should just automatically mark failed tests as flaky but I am ok with us making people feel more comfortable about doing that when tests are flaky and time is needed to investigate.

I think we should also try something like "flaky fix days/week" or something like that. We tried some of those within RH and I think we helped make some progress. If we did it as a project I think we should make it a "only fix flaky tests day/week" ie nothing else should be tested/landed during that period of time.

mcollina · 2022-08-10T15:05:04Z

This can be closed, CI is mostly better now.

mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jul 23, 2022

mhdawson mentioned this issue Jul 25, 2022

Node.js Technical Steering Committee (TSC) Meeting 2022-07-27 nodejs/TSC#1264

Closed

tniessen changed the title ~~Ruthlessly mark tests that fails frequently as flaky~~ Ruthlessly mark tests that fail frequently as flaky Jul 27, 2022

mhdawson mentioned this issue Aug 1, 2022

Node.js Technical Steering Committee (TSC) Meeting 2022-08-03 nodejs/TSC#1266

Closed

mhdawson mentioned this issue Aug 8, 2022

Node.js Technical Steering Committee (TSC) Meeting 2022-08-10 nodejs/TSC#1268

Closed

mcollina removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Aug 10, 2022

mcollina closed this as completed Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruthlessly mark tests that fail frequently as flaky #43955

Ruthlessly mark tests that fail frequently as flaky #43955

mcollina commented Jul 23, 2022

mcollina commented Jul 23, 2022

tniessen commented Jul 23, 2022

lpinca commented Jul 23, 2022

devsnek commented Jul 23, 2022 •

edited

Loading

lpinca commented Jul 23, 2022 •

edited

Loading

tniessen commented Jul 23, 2022

lpinca commented Jul 24, 2022

tniessen commented Jul 24, 2022

mcollina commented Jul 24, 2022

bnoordhuis commented Jul 24, 2022

mhdawson commented Jul 26, 2022

mcollina commented Aug 10, 2022

Ruthlessly mark tests that fail frequently as flaky #43955

Ruthlessly mark tests that fail frequently as flaky #43955

Comments

mcollina commented Jul 23, 2022

mcollina commented Jul 23, 2022

tniessen commented Jul 23, 2022

lpinca commented Jul 23, 2022

devsnek commented Jul 23, 2022 • edited Loading

lpinca commented Jul 23, 2022 • edited Loading

tniessen commented Jul 23, 2022

lpinca commented Jul 24, 2022

tniessen commented Jul 24, 2022

mcollina commented Jul 24, 2022

bnoordhuis commented Jul 24, 2022

mhdawson commented Jul 26, 2022

mcollina commented Aug 10, 2022

devsnek commented Jul 23, 2022 •

edited

Loading

lpinca commented Jul 23, 2022 •

edited

Loading