Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/MAINTAINER.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ __Note__: Difference between Merge Oncall and Codeowner
If you meet any issues during the merge, you can discuss in [slack channels](https://slack.sglang.ai/): #dev, #pull-request, and #ci-cd-build-release.

## The List of Merge Oncalls and Reviewers
TODO

Now we have many Merge Oncalls mainly because the CI is flaky and the CODEOWNERS is too coarse-grained.
In the future, we hope the CI can be improved and we only need bypass rarely. After that, most Merge Oncalls can be converted back to Write and CODEOWNERS.

This list is based on the current situation. If you or someone you know would like to take on more responsibility and are qualified, please ping @Lianmin Zheng and @Ying Sheng in the Slack channel. They will start a nomination and internal review process.

Expand Down
46 changes: 46 additions & 0 deletions .github/workflows/slash_command_handler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Slash Command Handler

on:
issue_comment:
types: [created]

permissions:
contents: read
pull-requests: write # Required to add labels and reactions
actions: write # Required to rerun workflows
issues: write # Required for comment reactions in some contexts

jobs:
slash_command:
# Only run if it is a PR and the comment starts with a recognized command
if: >
github.event.issue.pull_request &&
(startsWith(github.event.comment.body, '/tag-run-ci-label') ||
startsWith(github.event.comment.body, '/rerun-failed-ci'))
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
# We checkout the current context to get the script,
# but the script will fetch permissions from main as requested.

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install dependencies
run: |
pip install requests PyGithub

- name: Handle Slash Command
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_FULL_NAME: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
COMMENT_ID: ${{ github.event.comment.id }}
COMMENT_BODY: ${{ github.event.comment.body }}
USER_LOGIN: ${{ github.event.comment.user.login }}
run: |
python scripts/ci/slash_command_handler.py
129 changes: 129 additions & 0 deletions scripts/ci/slash_command_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import json
import os
import sys

import requests
from github import Github

# Configuration
PERMISSIONS_FILE_URL = "https://raw.githubusercontent.com/sgl-project/sglang/main/.github/CI_PERMISSIONS.json"


def get_env_var(name):
val = os.getenv(name)
if not val:
print(f"Error: Environment variable {name} not set.")
sys.exit(1)
Comment on lines +15 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling sys.exit(1) directly from this utility function makes it less reusable and harder to test. It's better practice for utility functions to signal errors by raising an exception. The calling function (main) can then catch this exception, print an appropriate error message, and exit.

Suggested change
print(f"Error: Environment variable {name} not set.")
sys.exit(1)
raise ValueError(f"Environment variable {name} not set.")

return val


def load_permissions(user_login):
"""
Downloads the permissions JSON from the main branch and returns
the permissions dict for the specific user.
"""
try:
print(f"Fetching permissions from {PERMISSIONS_FILE_URL}...")
response = requests.get(PERMISSIONS_FILE_URL)
response.raise_for_status()

data = response.json()
user_perms = data.get(user_login)

if not user_perms:
print(f"User '{user_login}' not found in permissions file.")
return None

return user_perms

except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching the broad Exception class is generally discouraged as it can swallow unexpected errors and make debugging harder. It's better to catch more specific exceptions that you expect to handle, such as network (requests.exceptions.RequestException) or JSON parsing (json.JSONDecodeError) errors.

Suggested change
except Exception as e:
except (requests.exceptions.RequestException, json.JSONDecodeError) as e:

print(f"Failed to load or parse permissions file: {e}")
sys.exit(1)


def handle_tag_run_ci(gh_repo, pr, comment, user_perms):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter gh_repo is defined in the function signature but is not used within the function body. Unused parameters should be removed to improve code clarity. Remember to update the call to this function in main accordingly.

Suggested change
def handle_tag_run_ci(gh_repo, pr, comment, user_perms):
def handle_tag_run_ci(pr, comment, user_perms):

"""
Handles the /tag-run-ci-label command.
"""
if not user_perms.get("can_tag_run_ci_label", False):
print("Permission denied: can_tag_run_ci_label is false.")
return

print("Permission granted. Adding 'run-ci' label.")
pr.add_to_labels("run-ci")

# React to the comment with +1
comment.create_reaction("+1")
print("Label added and comment reacted.")


def handle_rerun_failed_ci(gh_repo, pr, comment, user_perms):
"""
Handles the /rerun-failed-ci command.
"""
if not user_perms.get("can_rerun_failed_ci", False):
print("Permission denied: can_rerun_failed_ci is false.")
return

print("Permission granted. Triggering rerun of failed workflows.")

# Get the SHA of the latest commit in the PR
head_sha = pr.head.sha
print(f"Checking workflows for commit: {head_sha}")

# List all workflow runs for this commit
runs = gh_repo.get_workflow_runs(head_sha=head_sha)

rerun_count = 0
for run in runs:
# We only care about completed runs that failed
if run.status == "completed" and run.conclusion == "failure":
print(f"Rerunning workflow: {run.name} (ID: {run.id})")
run.rerun_failed()
rerun_count += 1

if rerun_count > 0:
comment.create_reaction("+1")
print(f"Triggered rerun for {rerun_count} failed workflows.")
else:
print("No failed workflows found to rerun.")


def main():
# 1. Load Environment Variables
token = get_env_var("GITHUB_TOKEN")
repo_name = get_env_var("REPO_FULL_NAME")
pr_number = int(get_env_var("PR_NUMBER"))
comment_id = int(get_env_var("COMMENT_ID"))
comment_body = get_env_var("COMMENT_BODY").strip()
user_login = get_env_var("USER_LOGIN")

# 2. Initialize GitHub API
g = Github(token)
repo = g.get_repo(repo_name)
pr = repo.get_pull(pr_number)
comment = repo.get_issue(pr_number).get_comment(comment_id)

# 3. Load Permissions (Remote Check)
user_perms = load_permissions(user_login)

if not user_perms:
print(f"User {user_login} does not have any configured permissions. Exiting.")
return

# 4. Parse Command and Execute
# split lines to handle cases where there might be text after the command
first_line = comment_body.split("\n")[0].strip()

if first_line.startswith("/tag-run-ci-label"):
handle_tag_run_ci(repo, pr, comment, user_perms)

elif first_line.startswith("/rerun-failed-ci"):
handle_rerun_failed_ci(repo, pr, comment, user_perms)
Comment on lines +118 to +122
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and avoid "magic strings," it's good practice to define command strings as constants. You can define them in the configuration section at the top of the file and then reference them here.

For example, you could add:

# In configuration section
CMD_TAG_RUN_CI = "/tag-run-ci-label"
CMD_RERUN_FAILED_CI = "/rerun-failed-ci"

Then, this block can be updated to use these constants.

Suggested change
if first_line.startswith("/tag-run-ci-label"):
handle_tag_run_ci(repo, pr, comment, user_perms)
elif first_line.startswith("/rerun-failed-ci"):
handle_rerun_failed_ci(repo, pr, comment, user_perms)
if first_line.startswith(CMD_TAG_RUN_CI):
handle_tag_run_ci(repo, pr, comment, user_perms)
elif first_line.startswith(CMD_RERUN_FAILED_CI):
handle_rerun_failed_ci(repo, pr, comment, user_perms)


else:
print(f"Unknown or ignored command: {first_line}")


if __name__ == "__main__":
main()
Loading