-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lookup: declare bankruptcy on flaky modules #959
Conversation
Pinging people who commented/reviewed the other PR that I messed up on. Please re-review. Thanks! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #959 +/- ##
=======================================
Coverage 96.44% 96.44%
=======================================
Files 28 28
Lines 2139 2139
=======================================
Hits 2063 2063
Misses 76 76 ☔ View full report in Codecov by Sentry. |
@RafaelGSS suggested:
Sure. Pinging maintainers who have modules that are being removed from the Node.js release test suite. There is no doubt a simple workaround or fix for many of these cases (like "let's skip testing on Windows for now for this module") and we can certainly add things back where that's the case. @marijnh @aearly @megawac @sindresorhus @novemberborn @3rdeden @einaros @lpinca @mcollina @stefanpenner @rwjblue @Turbo87 @kellyselden @dougwilson @delvedor @TooTallNate @cpojer @scotthovestadt @SimenB @thymikee @jeysal @ralphtheninja @vweevers @wadey @broofa @linus @nodejs/addon-api @nodejs/npm @siimon @zbjornson @mafintosh @jhnns @isaacs @reconbot @substack @jashkenas @ronag @ctavan @indexzero @einaros @3rd-Eden |
Keep both |
I think we should remove all of them in this PR and then you open a new PR to add them back with a green CI. Otherwise this PR will never be merged. |
Looks to me like Jest is just timing out and not actually failing. I assume the correct thing to do here (since the motivation seems to be "CITGM is slow") to reduce the amount of tests being run? |
If you remove them, I have no way to see what is failing and I have no chance to fix the issues. |
FWIW the failing test here https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/3213/nodes=ubuntu1804-64/testReport/junit/(root)/citgm/ws_v8_13_0/ was fixed in I mean... Node.js 20.6.0 broke |
This https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/3213/nodes=osx1015/testReport/junit/(root)/citgm/bufferutil_v4_0_7/ also does not seem an issue with the module but with the machine the module is being tested on. |
Additionally, I've never been pinged on any failure of either Jest or Speaking of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1. I have been trying to get Fastify and undici tests to pass and not be flaky since Node v20 was released. The problem is that v20 is more flaky and there are plenty of problems there.
This is not achieving anything.
On pino: I didn't know even that if was failing. |
I see multiple modules failing with the same root issue ☝️ as for acorn, it doesn't seem to install: https://ci.nodejs.org/job/citgm-smoker/3213/nodes=osx1015/testReport/(root)/citgm/acorn_v8_10_0/ |
Most of those modules did not become "flaky" due to the maintainers unresponsiveness. They became unresponsive due to bugs in Node. In fact, v18 is significantly better off. CITGM is actually doing its job: it tell us there are bugs to fix & investigate. |
|
While I'm aware of the I assume it's something to do with
Not sure if that helps at all. If not, let me know what you're expecting. |
No one said that maintainer unresponsiveness was the issue (as far as I've noticed, at least). The issue is that there are so many failing modules that no one (or almost no one?) actually successfully interprets the CITGM results.
What this achieves is it gives us a green CI and we can actually treat red CITGM runs as failures going forward, rather than doing what we do now, which is try to find the needle in the haystack of failing runs. We routinely have over 100 failures. That's not reasonable.
18.x has around 80 failures compared with 140 or so for 20.x. So it's true that 18.x does "better". But the reason 20.x does worse is that CITGM is difficult to use because of all those permanent failures. So, 80 failures becomes 100 becomes 115 becomes 130 etc. etc. etc. |
CITGM is what we use to test the impact of changes in Node.js on the ecosystem. In theory, it's supposed to show a bunch of modules with passing tests on a version of Node.js, and then we make changes or prepare a new release, and run CITGM as a check to make sure we aren't introducing changes that will break the ecosystem. The problem is that there are so many failures that releasers find it hard to interpret the results and therefore they get ignored. So it's like not having CITGM at all. (I'm exaggerating here, but it's basically what happened for a recent release, which broke a bunch of stuff.) For purposes here, it does not matter if the failing tests are due to Node.js changes, bugs in modules, incompetence on my part, or whatever. (Honestly, it's entirely likely a lot of them are due to specifics of our testing environment.) The point is to get CITGM to be green and we can add stuff back in. Since you are listed (correctly or otherwise--we don't update the lists that often) as a maintainer for one or more of the modules being removed, you got a ping. Hope that provides the needed context. |
Exactly! Node.js 20.6.0 broke stuff because the CITGM results were basically ignored because there are so many failures in there already. So tossing everything that's failing (regardless of whose "fault" it is!) gets us back to a CITGM that, while imperfect, is at least usable. That said, if your module was broken specifically by 20.6.0, there's a good change it will be fixed in the 20.6.1 which is being prepared right now. So yeah, if that's the case, it probably makes sense to keep that module in the lookup.json file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a clean slate, and gradually add these dependencies with separate runs rather than blaming and finding the side to blame.
Yes! I am 100% in favor of this, and everything @Trott says here is exactly correct and brutally honest. 👏 👏 👏 It would be better to not have a test, than to have test where failure is even slightly acceptable. Get CITGM green, then add things back in as they pass. Moving forward, when it goes red, treat it as an actual problem and do not proceed until it's green again. This is The Way. Go slow to move fast. |
Here's a run to test the upcoming 20.6.1: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/3214 That's 133 failures. (And again, it's entirely possible that most of these are problems in our CI setup or whatever. But it's been like this basically since we launched CITGM however many years ago. And we're in that awful state where we ignore red CI. Better to test less and actually respond to failures than test more and ignore failures.) |
"prefix": "v", | ||
"maintainers": "jhnns" | ||
}, | ||
"rimraf": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rimraf: Failing because ts-node doesn't work on node 20.
@broofa just ran uuid with citgm locally. I'm guessing citgm uses the release tarball, and this removed test uuidjs/uuid#709 is still part of it, hence it fails. npx citgm uuid
info: starting | uuid
info: lookup | uuid
info: lookup-found | uuid
info: uuid lookup-replace | https://github.com/uuidjs/uuid/archive/4cf24c018cead5ebe48cb4da232b57a2345d9fb5.tar.gz
info: uuid npm: | Downloading project: https://github.com/uuidjs/uuid/archive/4cf24c018cead5ebe48cb4da232b57a2345d9fb5.tar.gz
info: uuid npm: | Project downloaded 4cf24c018cead5ebe48cb4da232b57a2345d9fb5.tar.gz
info: uuid npm: | npm install started
warn: uuid npm-install: | fatal: not a git repository (or any of the parent directories): .git
warn: uuid npm-install: | fatal: not a git repository (or any of the parent directories): .git
info: uuid npm: | npm install successfully completed
info: uuid npm: | test suite started
error: failure | The canary is dead:
error: failing module(s) |
error: module name: | uuid
error: version: | 9.0.0
error: error: | The canary is dead:
error: error: | undefined
error: | > [email protected] prepare
error: | > cd $( git rev-parse --show-toplevel ) && husky install
error: |
error: |
error: | added 1049 packages in 2s
error: |
error: | > [email protected] pretest
error: | > [ -n $CI ] || npm run build
error: |
error: |
error: | > [email protected] test
error: | > BABEL_ENV=commonjsNode node --throw-deprecation node_modules/.bin/jest test/unit/
error: |
error: |
error: | fatal: not a git repository (or any of the parent directories): .git
error: | fatal: not a git repository (or any of the parent directories): .git
error: | PASS test/unit/v1-random.test.js
error: | PASS test/unit/validate.test.js
error: | PASS test/unit/v1-rng.test.js
error: | PASS test/unit/version.test.js
error: | PASS test/unit/v1.test.js
error: | PASS test/unit/stringify.test.js
error: | PASS test/unit/v4.test.js
error: | PASS test/unit/parse.test.js
error: | FAIL test/unit/rng.test.js
error: | ● rng › Browser without crypto.getRandomValues()
error: |
error: | assert.throws(function)
error: |
error: | Expected the function to throw an error.
error: | But it didn't throw anything.
error: |
error: | Message:
error: | Missing expected exception.
error: |
error: | 14 |
error: | 15 | test('Browser without crypto.getRandomValues()', () => {
error: | > 16 | assert.throws(() => {
error: | | ^
error: | 17 | rngBrowser();
error: | 18 | });
error: | 19 | });
error: |
error: | at Object.throws (test/unit/rng.test.js:16:12)
error: |
error: | PASS test/unit/v35.test.js
error: |
error: | Test Suites: 1 failed, 9 passed, 10 total
error: | Tests: 1 failed, 49 passed, 50 total
error: | Snapshots: 0 total
error: | Time: 0.825 s
error: | Ran all test suites matching /test\/unit\//i.
error: done | The smoke test has failed.
info: duration | test duration: 5309ms in short, if uuid would release a new version, citgm would pass. |
I'm not sure how to interpret the failure on
|
Adding the @nodejs/releasers @nodejs/citgm @nodejs/tsc |
This is incredible! And in the grand scheme of things, I don’t think 110 minutes versus 154 minutes makes much difference at all; either way it’s way longer than I would wait for it, so it could take hours for all I care. Making it sequential for now seems like an easy win. We could maybe split out the modules into two buckets, of ones safe to run in parallel versus ones that can’t, but that can be a future improvement. |
FWIW |
+1 - I've been consistently running in serial mode for testing releases, it really helps a lot. |
I've removed |
Is it now ready to review? @Trott |
@Trott this clean up will be a godsend for the next v18 release! Do you believe it's in a good enough state to move ahead with? Personally, I'd love to be able to start relying on a green citgm for the backport PRs. |
FWIW, Jest (on |
I feel like we can probably add a lot of things back in (like |
Here's a current CITGM (not this PR) against the main branch so we can have a new baseline: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/3254/ |
I also wonder if removing the |
Jest will need |
eb7dbf3
to
767636d
Compare
I notice bluebird is failing consistently on the Jenkins CITGM runs but it passes locally for me. Anyone know what's up with it?
|
Another baseline run against the main branch to see if there are things that we're removing that don't need to be removed: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/3259/ |
Remove all modules that failed the latest 20.x release run. We need to be able to trust that a failed CITGM run is something that needs to be investigated.
CIGTM against this PR: https://ci.nodejs.org/job/citgm-smoker/3260/ |
On the one hand, 56 failures is a nice improvement from 100 failures. But (a) not nearly as much as I was hoping and (b) there are a lot of new failures! I wonder if some of them are because by default, I think we install CITGM from My inclination is to:
|
Why not install it from the GitHub repo directly? |
I don't know the reason for that decision. I would guess that it's so that releases aren't broken by changes to the CITGM main branch. If changes to the CITGM main branch are going to be live immediately, then we'd want a CITGM run with every change we make. (Which, actually, might not be a bad idea at all.) |
You could also Though like you say, I don’t know why we wouldn’t just always want the latest |
I am unable to publish citgm to Or don't make me a collaborator, but grab the 9.0.0 tag I just pushed and publish it? |
Remove all modules that failed the latest 20.x release run. We need to be able to trust that a failed CITGM run is something that needs to be investigated.
Checklist
npm test
passeshere