Skip to content

feat: set up very expensive tests to run in CI#12939

Merged
galargh merged 19 commits intomasterfrom
very-expensive-tests
Apr 27, 2025
Merged

feat: set up very expensive tests to run in CI#12939
galargh merged 19 commits intomasterfrom
very-expensive-tests

Conversation

@galargh
Copy link
Copy Markdown
Contributor

@galargh galargh commented Mar 6, 2025

Related Issues

Closes #12136

Proposed Changes

In this PR I set up a new workflow responsible for running very expensive tests. I also enable running expensive tests in our regular test workflow.

Additional Info

This is an updated version of #12234

In this iteration, the very expensive tests workflow is separate to the test workflow. This is so that we don't have to rerun all the tests on a label change.

The way the new workflow works is that when one adds a need/very-expensive-tests label to a PR, the workflow will start, remove the label, and execute the tests.

The new workflow will only execute the tests that have very expensive components to them. The information about that is currently stored in the ci cmd tool. It will be moved to the test files themselves in the future once we move more towards the self-identification setup.

This is not ready to be merged because the only very expensive test we have at the moment is failing with signal: killed. Here's a link to the most recent failure: https://github.com/filecoin-project/lotus/actions/runs/13706584703

Question: should the workflow comment on a PR with the very expensive test results? I think it might be useful since if one modifies labels after the very expensive tests are run, then the link to the very expensive tests run would disappear from the PR.

Checklist

Before you mark the PR ready for review, please make sure that:

@github-project-automation github-project-automation Bot moved this to 📌 Triage in FilOz Mar 6, 2025
@galargh galargh force-pushed the very-expensive-tests branch 2 times, most recently from 18a97fb to 8efd0e7 Compare March 6, 2025 17:39
@galargh galargh force-pushed the very-expensive-tests branch from 8efd0e7 to f204a6a Compare March 6, 2025 17:43
@galargh galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@galargh galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025
@galargh galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025
@github-actions github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@galargh galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@galargh galargh force-pushed the very-expensive-tests branch from ef83f34 to 78a75b9 Compare March 6, 2025 18:49
@galargh galargh added need/very-expensive-tests This label triggers the Very Expensive Test workflow run and removed need/very-expensive-tests This label triggers the Very Expensive Test workflow run labels Mar 6, 2025
@github-actions github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@galargh galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@github-actions github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@galargh galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@github-actions github-actions Bot removed the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Mar 6, 2025
@rvagg
Copy link
Copy Markdown
Member

rvagg commented Mar 31, 2025

Looking at the bad test: LOTUS_RUN_VERY_EXPENSIVE_TESTS=1 go test ./itests/niporep_manual_test.go -run TestManualNISectorOnboarding/real_proofs,_1_miner_with_7_sectors,_1_bad -v

At the peak, only about 700 epochs in, which is where the test is getting killed (quite early, this isn't timeout related) we get to ~32GiB of RAM, so I assume this is the limitation we're dealing with for these runners?

3836338 rvagg     20   0 4226.9g  31.6g   2.7g S  3064  16.8   7:55.08 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+

It goes down from there, this is the initial part where it's generating the initial proof so it's a lot of work. There's only this one case that's in un here and it's the real_proofs that make it "very expensive". What options do we have on the runner side here because I'm not sure how much lattitude we have on the software side because I'm suspecting this is almost all in the proofs code; but I can explore that further if we have no options. Could we add some small swap just for this case if we're strictly limited to 32GiB? We shouldn't need much at all, I think we're probably just pushing a little over 32GiB for the system in this run.

@rvagg
Copy link
Copy Markdown
Member

rvagg commented Mar 31, 2025

btw once we get into it, it goes way down:

3836338 rvagg     20   0 4185.4g   3.7g   1.3g S  53.3   2.0 228:35.20 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+

but it does take a long time to run this test because of the amount of time needed to tick through all the epochs required. In my test run it's got up to 15,478 epochs 556.76s; on a fairly fast system.

@BigLep
Copy link
Copy Markdown
Member

BigLep commented Apr 4, 2025

@galargh : apologies for the delay here. A few things:

For getting the current VERY_EXPENSIVE_TEST to pass, it seems like we need more memroy (slightly more than 32GB). Can this be accomplished with beefier runners or can we enable some swap space on the runners we have (more info)?

In terms of the interaction model:

  1. Periodic running where failures cut an issue - it seems like we're covered here ✅
  2. Selectively running at times a manual step in the Lotus release process or during a PR that we think warrants an explicit change - It sounds like the model we're looking at is triggering a a test run by applying a label. I am understanding the interaction better where applying the PR "run expensive tests" label queues up the test run and when the test run completes, the results are linked as a comment in the PR and the PR "run expensive tests" label is removed. The PR label is removed so that there is a way to explicitly re-request "run expensive tests" on the PR again (e.g., after pushing additional commits). I think that works and if that's a lot easier from an implementation regard, lets go with it. I don't want to block this PR further on best being the enemy of better.
    • I was envisioning a world where whenever we'd run our tests normally there would be a step to check if the PR "run expensive tests" label was present on the PR and if so also run expensive tests and if not skip the expensive tests. But per above, if that isn't easily feasible, lets skip that.

@galargh galargh added the need/very-expensive-tests This label triggers the Very Expensive Test workflow run label Apr 9, 2025
@galargh galargh force-pushed the very-expensive-tests branch from a224569 to f956c13 Compare April 9, 2025 08:44
@galargh galargh force-pushed the very-expensive-tests branch from f956c13 to 44730b1 Compare April 9, 2025 08:46
@galargh galargh force-pushed the very-expensive-tests branch from d5fb5dd to 81be5d8 Compare April 9, 2025 09:30
@galargh galargh changed the title wip: set up very expensive tests to run in CI feat: set up very expensive tests to run in CI Apr 15, 2025
@galargh
Copy link
Copy Markdown
Contributor Author

galargh commented Apr 15, 2025

I moved the tests to a slightly bigger runner and it looks like we're through :) I also updated the very expensive tests workflow so that it behaves similarly to our regular one. The one difference is, it will also get triggered whenever a label is added to a PR.

@galargh galargh marked this pull request as ready for review April 15, 2025 13:21
@galargh galargh requested a review from BigLep April 15, 2025 13:21
Comment thread .github/workflows/very-expensive-test.yml
Comment thread .github/workflows/reusable-test.yml
Copy link
Copy Markdown
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish it were easier to compare the old test.yml and reusable-test.yml. I tried checking old and new out locally and manually diffing them but the changes are huge so maybe formatting was changed? I ended up side-by-side in the GitHub UI to confirm they're mostly the same. I find these files really dense and hard to parse so keeping diffs minimal where possible helps.

@github-project-automation github-project-automation Bot moved this from ⌨️ In Progress to ✔️ Approved by reviewer in FilOz Apr 16, 2025
@galargh
Copy link
Copy Markdown
Contributor Author

galargh commented Apr 16, 2025

This is the diff of test.yml from master vs resuable-test.yml - 6cc0c97

@rvagg
Copy link
Copy Markdown
Member

rvagg commented Apr 16, 2025

ACK, thanks, this is good to go I think

Copy link
Copy Markdown
Member

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but looks good to me. I'll let you merge in case you want to monitor anything afterwards.

Comment thread .github/workflows/very-expensive-test.yml
Comment thread .github/workflows/very-expensive-test.yml
Comment thread .github/workflows/very-expensive-test.yml
@galargh galargh enabled auto-merge (squash) April 27, 2025 17:34
@galargh galargh merged commit 2f9c021 into master Apr 27, 2025
99 checks passed
@galargh galargh deleted the very-expensive-tests branch April 27, 2025 17:34
@github-project-automation github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FilOz Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

need/very-expensive-tests This label triggers the Very Expensive Test workflow run skip/changelog This change does not require CHANGELOG.md update

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

[DX Streamline] LOTUS_RUN_EXPENSIVE_TESTS periodically outside of PR workflow

3 participants