diff --git a/.github/MAINTAINER.md b/.github/MAINTAINER.md index b7018e8c0de7..58b71196c948 100644 --- a/.github/MAINTAINER.md +++ b/.github/MAINTAINER.md @@ -143,5 +143,12 @@ This section lists the oncalls for each hardware platform. The format is @github This list is based on the current situation. If you or someone you know would like to donate machines for CI, they can serve as the CI oncalls for their machines. Please ping [Lianmin Zheng](https://github.com/merrymercy) and [Ying Sheng](https://github.com/Ying1123) in the Slack channel. They will start a nomination and internal review process. +## CI Maintenance Mode +When the CI is unhealthy (e.g., the scheduled pr-test on `main` is broken for consecutive runs), the project enters **CI Maintenance Mode** by opening [issue #21065](https://github.com/sgl-project/sglang/issues/21065). While active: +- All PR CI runs are paused. Resources are allocated to PRs that fix the CI. +- **Merging non-CI-fix PRs is prohibited.** Only PRs that fix the CI may be merged. In severe cases, merge permissions may be revoked. + +Maintenance mode ends when `pr-test.yml` is all green on `main` and the issue is closed. + ## Suspending Permissions -If a Merge Oncall bypasses checks to merge a PR that breaks the `main` branch, or if they repeatedly break the CI due to various reasons, their privileges will be suspended for at least two days, depending on the severity of the incident. +If a Merge Oncall bypasses checks to merge a PR that breaks the `main` branch, merges a non-CI-fix PR during CI Maintenance Mode, or repeatedly breaks the CI due to various reasons, their privileges will be suspended for at least two days, depending on the severity of the incident. diff --git a/.github/actions/check-maintenance/action.yml b/.github/actions/check-maintenance/action.yml index 94a0b20d5606..595283dcdfae 100644 --- a/.github/actions/check-maintenance/action.yml +++ b/.github/actions/check-maintenance/action.yml @@ -1,5 +1,5 @@ name: Check Maintenance Mode -description: Blocks CI when maintenance mode is active (issue #21065 is open), unless the PR has the bypass-maintenance label, or env SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN=true (PR Test workflow on main only). +description: Blocks CI when maintenance mode is active (issue #21065 is open), unless the PR has the bypass-maintenance label, or env SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN=true (PR Test workflow on main only). Merging non-CI-fix PRs is prohibited during maintenance mode; in severe cases, merge permissions may be revoked. inputs: github-token: @@ -46,10 +46,12 @@ runs: "## ⚠️ CI Maintenance Mode is Active" \ "The CI infrastructure is currently under maintenance." \ "All PR CI runs are paused until maintenance is complete." \ + "**Merging non-CI-fix PRs is prohibited during maintenance mode.** In severe cases, merge permissions may be revoked." \ "You might also experience unexpected failures during this period." \ "The team is working on the issue and will update the status as soon as possible." \ "" \ "What should you do?" \ + "- **Do NOT merge non-CI-fix PRs** until maintenance mode is lifted" \ "- Check back later (~12 hours)" \ "- Follow CI Maintenance Mode issue: https://github.com/$REPO/issues/$MAINTENANCE_ISSUE for status updates")