Skip to content

ci: add cleanup step to nightly release self-hosted runner jobs#2510

Merged
yongwww merged 1 commit intoflashinfer-ai:mainfrom
yongwww:fix_nightly
Feb 20, 2026
Merged

ci: add cleanup step to nightly release self-hosted runner jobs#2510
yongwww merged 1 commit intoflashinfer-ai:mainfrom
yongwww:fix_nightly

Conversation

@yongwww
Copy link
Member

@yongwww yongwww commented Feb 6, 2026

📌 Description

Fix intermittent EACCES: permission denied error in nightly release workflow when actions/checkout@v4 tries to clean the workspace on reused self-hosted runners.

The root cause: Docker containers run as root and create root-owned files (e.g., .pytest_cache/.gitignore). On the next run, the runner process (non-root) cannot delete these files, causing checkout to fail.

Fix: Add a cleanup step (with sudo rm) before checkout in the two self-hosted runner jobs (build-flashinfer-jit-cache and test-nightly-build). This is the same pattern used in pr-test-runner.yml.

Example error: https://github.com/flashinfer-ai/flashinfer/actions/runs/21736799287/job/62713877213

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

cc: @yzh119 @bkryu @dierksen

Summary by CodeRabbit

  • Chores
    • Improved CI/CD pipeline efficiency by adding cleanup procedures to automated workflows. This includes clearing Docker containers and workspace resources to optimize build environment management.

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Added cleanup steps to two GitHub Actions jobs in the nightly release workflow. The new steps stop and remove Docker containers, clear the workspace including hidden dotfiles, and prune Docker resources before displaying machine information.

Changes

Cohort / File(s) Summary
CI Workflow Cleanup
.github/workflows/nightly-release.yml
Added Cleanup step to build-flashinfer-jit-cache and test-nightly-build jobs that executes Docker container removal, workspace clearing, and Docker system pruning.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • yzh119
  • bkryu
  • dierksen

Poem

🐰✨ Hop along, the containers must go,
Docker dust cleared in the CI's soft glow,
Workspace scrubbed clean, dotfiles erased,
Nightly builds run at a faster pace!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a cleanup step to nightly release self-hosted runner jobs in CI configuration.
Description check ✅ Passed The description is comprehensive and follows the template structure with a detailed explanation of the problem, root cause, and solution, though the 'Related Issues' section is empty.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yongwww
Copy link
Member Author

yongwww commented Feb 6, 2026

I canceled the pr test because it is unrelated. will send a follow-up pr to disable the auto-run all all contributors

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/pr-test.yml (1)

175-186: ⚠️ Potential issue | 🔴 Critical

secrets context is not available in step-level if conditions — workflow will fail.

GitHub Actions only allows secrets inside env, with, and run blocks, not in step if expressions. Both Line 175 and Line 186 will cause a workflow evaluation error. This is confirmed by actionlint.

A common workaround is to expose the secret's presence as an env var or a prior step output, then reference that in the if.

Proposed fix

Add an env mapping at the job level and reference it in step conditions:

   orchestrator:
     name: Orchestrate Tests
     needs: [gate, setup]
     if: |
       needs.gate.outputs.authorized == 'true' &&
       needs.setup.outputs.skip_build != 'true'
     runs-on: ubuntu-latest
+    env:
+      HAS_GH_APP: ${{ secrets.GH_APP_ID != '' }}
     steps:
       - name: Generate Token (flashinfer)
-        if: secrets.GH_APP_ID != ''
+        if: env.HAS_GH_APP == 'true'
         id: flashinfer-token
         uses: actions/create-github-app-token@v1
         with:
           app-id: ${{ secrets.GH_APP_ID }}
           private-key: ${{ secrets.GH_APP_KEY }}
           owner: flashinfer-ai
           repositories: flashinfer

       - name: Create Check Runs (PR only)
         id: create-checks
-        if: github.event_name == 'pull_request' && secrets.GH_APP_ID != ''
+        if: github.event_name == 'pull_request' && env.HAS_GH_APP == 'true'
         env:
           GH_TOKEN: ${{ steps.flashinfer-token.outputs.token }}

@yzh119
Copy link
Collaborator

yzh119 commented Feb 20, 2026

btw, it's time for us to move to a standalone nightly repo.

@yzh119
Copy link
Collaborator

yzh119 commented Feb 20, 2026

@yongwww H100 unittest terminates without retry, can you take a look?

@yongwww
Copy link
Member Author

yongwww commented Feb 20, 2026

@yongwww H100 unittest terminates without retry, can you take a look?

H100 doesn’t currently have Spot capacity available, so we didn’t add retry logic for it. We can ignore the H100 failure in this PR since it’s unrelated—main is failing on H100 as well at the moment (tests/gdn/test_decode_delta_rule.py: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/270137557).

@yongwww
Copy link
Member Author

yongwww commented Feb 20, 2026

btw, it's time for us to move to a standalone nightly repo.

Yeah, it would be great to get your help to create the repo (likely https://github.com/flashinfer-ai/nightly), @yzh119.

@yongwww yongwww merged commit fc23370 into flashinfer-ai:main Feb 20, 2026
27 of 30 checks passed
@yongwww yongwww deleted the fix_nightly branch February 20, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants