ci: add cleanup step to nightly release self-hosted runner jobs#2510
ci: add cleanup step to nightly release self-hosted runner jobs#2510yongwww merged 1 commit intoflashinfer-ai:mainfrom
Conversation
|
Note Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported. |
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdded cleanup steps to two GitHub Actions jobs in the nightly release workflow. The new steps stop and remove Docker containers, clear the workspace including hidden dotfiles, and prune Docker resources before displaying machine information. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I canceled the pr test because it is unrelated. will send a follow-up pr to disable the auto-run all all contributors |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/pr-test.yml (1)
175-186:⚠️ Potential issue | 🔴 Critical
secretscontext is not available in step-levelifconditions — workflow will fail.GitHub Actions only allows
secretsinsideenv,with, andrunblocks, not in stepifexpressions. Both Line 175 and Line 186 will cause a workflow evaluation error. This is confirmed by actionlint.A common workaround is to expose the secret's presence as an env var or a prior step output, then reference that in the
if.Proposed fix
Add an env mapping at the job level and reference it in step conditions:
orchestrator: name: Orchestrate Tests needs: [gate, setup] if: | needs.gate.outputs.authorized == 'true' && needs.setup.outputs.skip_build != 'true' runs-on: ubuntu-latest + env: + HAS_GH_APP: ${{ secrets.GH_APP_ID != '' }} steps: - name: Generate Token (flashinfer) - if: secrets.GH_APP_ID != '' + if: env.HAS_GH_APP == 'true' id: flashinfer-token uses: actions/create-github-app-token@v1 with: app-id: ${{ secrets.GH_APP_ID }} private-key: ${{ secrets.GH_APP_KEY }} owner: flashinfer-ai repositories: flashinfer - name: Create Check Runs (PR only) id: create-checks - if: github.event_name == 'pull_request' && secrets.GH_APP_ID != '' + if: github.event_name == 'pull_request' && env.HAS_GH_APP == 'true' env: GH_TOKEN: ${{ steps.flashinfer-token.outputs.token }}
|
btw, it's time for us to move to a standalone nightly repo. |
|
@yongwww H100 unittest terminates without retry, can you take a look? |
H100 doesn’t currently have Spot capacity available, so we didn’t add retry logic for it. We can ignore the H100 failure in this PR since it’s unrelated—main is failing on H100 as well at the moment (tests/gdn/test_decode_delta_rule.py: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/270137557). |
Yeah, it would be great to get your help to create the repo (likely https://github.com/flashinfer-ai/nightly), @yzh119. |
📌 Description
Fix intermittent
EACCES: permission deniederror in nightly release workflow whenactions/checkout@v4tries to clean the workspace on reused self-hosted runners.The root cause: Docker containers run as root and create root-owned files (e.g.,
.pytest_cache/.gitignore). On the next run, the runner process (non-root) cannot delete these files, causing checkout to fail.Fix: Add a cleanup step (with
sudo rm) before checkout in the two self-hosted runner jobs (build-flashinfer-jit-cacheandtest-nightly-build). This is the same pattern used inpr-test-runner.yml.Example error: https://github.com/flashinfer-ai/flashinfer/actions/runs/21736799287/job/62713877213
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
cc: @yzh119 @bkryu @dierksen
Summary by CodeRabbit