feat: set up very expensive tests to run in CI by galargh · Pull Request #12939 · filecoin-project/lotus

galargh · 2025-03-06T17:21:47Z

Related Issues

Proposed Changes

In this PR I set up a new workflow responsible for running very expensive tests. I also enable running expensive tests in our regular test workflow.

Additional Info

This is an updated version of #12234

In this iteration, the very expensive tests workflow is separate to the test workflow. This is so that we don't have to rerun all the tests on a label change.

The way the new workflow works is that when one adds a need/very-expensive-tests label to a PR, the workflow will start, remove the label, and execute the tests.

The new workflow will only execute the tests that have very expensive components to them. The information about that is currently stored in the ci cmd tool. It will be moved to the test files themselves in the future once we move more towards the self-identification setup.

This is not ready to be merged because the only very expensive test we have at the moment is failing with signal: killed. Here's a link to the most recent failure: https://github.com/filecoin-project/lotus/actions/runs/13706584703

Question: should the workflow comment on a PR with the very expensive test results? I think it might be useful since if one modifies labels after the very expensive tests are run, then the link to the very expensive tests run would disappear from the PR.

Checklist

Before you mark the PR ready for review, please make sure that:

Commits have a clear commit message.
PR title conforms with contribution conventions
Update CHANGELOG.md or signal that this change does not need it per contribution conventions
New features have usage guidelines and / or documentation updates in
- Lotus Documentation
- Discussion Tutorials
Tests exist for new functionality or change in behavior
CI is green

rvagg · 2025-03-31T03:14:19Z

Looking at the bad test: LOTUS_RUN_VERY_EXPENSIVE_TESTS=1 go test ./itests/niporep_manual_test.go -run TestManualNISectorOnboarding/real_proofs,_1_miner_with_7_sectors,_1_bad -v

At the peak, only about 700 epochs in, which is where the test is getting killed (quite early, this isn't timeout related) we get to ~32GiB of RAM, so I assume this is the limitation we're dealing with for these runners?

3836338 rvagg     20   0 4226.9g  31.6g   2.7g S  3064  16.8   7:55.08 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+

It goes down from there, this is the initial part where it's generating the initial proof so it's a lot of work. There's only this one case that's in un here and it's the real_proofs that make it "very expensive". What options do we have on the runner side here because I'm not sure how much lattitude we have on the software side because I'm suspecting this is almost all in the proofs code; but I can explore that further if we have no options. Could we add some small swap just for this case if we're strictly limited to 32GiB? We shouldn't need much at all, I think we're probably just pushing a little over 32GiB for the system in this run.

rvagg · 2025-03-31T03:17:13Z

btw once we get into it, it goes way down:

3836338 rvagg     20   0 4185.4g   3.7g   1.3g S  53.3   2.0 228:35.20 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+

but it does take a long time to run this test because of the amount of time needed to tick through all the epochs required. In my test run it's got up to 15,478 epochs 556.76s; on a fairly fast system.

BigLep · 2025-04-04T21:54:17Z

@galargh : apologies for the delay here. A few things:

For getting the current VERY_EXPENSIVE_TEST to pass, it seems like we need more memroy (slightly more than 32GB). Can this be accomplished with beefier runners or can we enable some swap space on the runners we have (more info)?

In terms of the interaction model:

Periodic running where failures cut an issue - it seems like we're covered here ✅
Selectively running at times a manual step in the Lotus release process or during a PR that we think warrants an explicit change - It sounds like the model we're looking at is triggering a a test run by applying a label. I am understanding the interaction better where applying the PR "run expensive tests" label queues up the test run and when the test run completes, the results are linked as a comment in the PR and the PR "run expensive tests" label is removed. The PR label is removed so that there is a way to explicitly re-request "run expensive tests" on the PR again (e.g., after pushing additional commits). I think that works and if that's a lot easier from an implementation regard, lets go with it. I don't want to block this PR further on best being the enemy of better.
- I was envisioning a world where whenever we'd run our tests normally there would be a step to check if the PR "run expensive tests" label was present on the PR and if so also run expensive tests and if not skip the expensive tests. But per above, if that isn't easily feasible, lets skip that.

galargh · 2025-04-15T13:21:29Z

I moved the tests to a slightly bigger runner and it looks like we're through :) I also updated the very expensive tests workflow so that it behaves similarly to our regular one. The one difference is, it will also get triggered whenever a label is added to a PR.

rvagg

I wish it were easier to compare the old test.yml and reusable-test.yml. I tried checking old and new out locally and manually diffing them but the changes are huge so maybe formatting was changed? I ended up side-by-side in the GitHub UI to confirm they're mostly the same. I find these files really dense and hard to parse so keeping diffs minimal where possible helps.

galargh · 2025-04-16T06:37:46Z

This is the diff of test.yml from master vs resuable-test.yml - 6cc0c97

rvagg · 2025-04-16T11:33:29Z

ACK, thanks, this is good to go I think

BigLep

Minor comments, but looks good to me. I'll let you merge in case you want to monitor anything afterwards.

galargh added 4 commits March 6, 2025 17:25

ci: run expensive tests in the CI

4dd2d3a

ci: make the test workflow reusable

bd92839

ci: run very expensive tests on label addition and on schedule

fb617de

ci: limit the number of tests executed by the very expensive test runner

2e9d10d

github-project-automation Bot added this to FilOz Mar 6, 2025

github-project-automation Bot moved this to 📌 Triage in FilOz Mar 6, 2025

galargh force-pushed the very-expensive-tests branch 2 times, most recently from 18a97fb to 8efd0e7 Compare March 6, 2025 17:39

ci: fix the test workflow setup

f204a6a

galargh force-pushed the very-expensive-tests branch from 8efd0e7 to f204a6a Compare March 6, 2025 17:43

galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

galargh added 2 commits March 6, 2025 19:21

ci: do not cache dependencies when running very expensive tests

5cddf58

ci: do not wait for very expensive tests to finish to remove the label

97a22d7

galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025

ci: fix the label reference

cf416da

galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025

github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

ci: ensure the very expensive tests get executed

78a75b9

galargh force-pushed the very-expensive-tests branch from ef83f34 to 78a75b9 Compare March 6, 2025 18:49

galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025

github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

ci: do cache very expensive tests after all

64725f3

galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025

galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Apr 9, 2025

galargh force-pushed the very-expensive-tests branch from a224569 to f956c13 Compare April 9, 2025 08:44

ci: update very expensive test trigger and add memory monitoring

44730b1

galargh force-pushed the very-expensive-tests branch from f956c13 to 44730b1 Compare April 9, 2025 08:46

ci: run very-expensive-tests on network optimized runners

81be5d8

galargh force-pushed the very-expensive-tests branch from d5fb5dd to 81be5d8 Compare April 9, 2025 09:30

galargh added 2 commits April 9, 2025 13:33

ci: monitor free memory only on debug reruns

b4d3019

Merge remote-tracking branch 'origin/master' into very-expensive-tests

2dc43c3

galargh changed the title ~~wip: set up very expensive tests to run in CI~~ feat: set up very expensive tests to run in CI Apr 15, 2025

galargh marked this pull request as ready for review April 15, 2025 13:21

galargh requested a review from BigLep April 15, 2025 13:21

rvagg reviewed Apr 15, 2025

View reviewed changes

Comment thread .github/workflows/very-expensive-test.yml

rvagg reviewed Apr 16, 2025

View reviewed changes

Comment thread .github/workflows/reusable-test.yml

rvagg approved these changes Apr 16, 2025

View reviewed changes

github-project-automation Bot moved this from ⌨️ In Progress to ✔️ Approved by reviewer in FilOz Apr 16, 2025

galargh added 2 commits April 16, 2025 08:28

Merge remote-tracking branch 'origin/master' into very-expensive-tests

9642883

wip

e393c7b

BigLep approved these changes Apr 21, 2025

View reviewed changes

Comment thread .github/workflows/very-expensive-test.yml

Comment thread .github/workflows/very-expensive-test.yml

Comment thread .github/workflows/very-expensive-test.yml

feat: do not create new issues if one already exists

80a9a80

galargh enabled auto-merge (squash) April 27, 2025 17:34

galargh merged commit 2f9c021 into master Apr 27, 2025
99 checks passed

galargh deleted the very-expensive-tests branch April 27, 2025 17:34

github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FilOz Apr 27, 2025

rjan90 mentioned this pull request May 1, 2025

build: release Lotus Node v1.33.0-rc1 #13088

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: set up very expensive tests to run in CI#12939

feat: set up very expensive tests to run in CI#12939
galargh merged 19 commits intomasterfrom
very-expensive-tests

galargh commented Mar 6, 2025 •

edited

Loading

Uh oh!

rvagg commented Mar 31, 2025

Uh oh!

rvagg commented Mar 31, 2025

Uh oh!

BigLep commented Apr 4, 2025

Uh oh!

galargh commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

rvagg left a comment

Uh oh!

galargh commented Apr 16, 2025

Uh oh!

rvagg commented Apr 16, 2025

Uh oh!

BigLep left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

galargh commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes

Additional Info

Checklist

Uh oh!

rvagg commented Mar 31, 2025

Uh oh!

rvagg commented Mar 31, 2025

Uh oh!

BigLep commented Apr 4, 2025

Uh oh!

galargh commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

rvagg left a comment

Choose a reason for hiding this comment

Uh oh!

galargh commented Apr 16, 2025

Uh oh!

rvagg commented Apr 16, 2025

Uh oh!

BigLep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

galargh commented Mar 6, 2025 •

edited

Loading