fix(ci): rate limit and permission errors in trace publishing by alisonshao · Pull Request #17238 · sgl-project/sglang

alisonshao · 2026-01-16T23:32:30Z

Fix nightly trace publishing to correctly identify permission errors vs rate limit errors.

Previously all 403 errors were silently skipped as "rate limits", masking the actual issue: the GH_PAT_FOR_NIGHTLY_CI_DATA token lacks write permission to sglang-bot/sglang-ci-data.

Example failure: https://github.com/sgl-project/sglang/actions/runs/21051323020/job/60538104515#step:5:85

After this fix, permission errors will fail with a clear message instead of being silently ignored. The token secret will need to be updated with proper contents: write permission.

gemini-code-assist · 2026-01-16T23:32:48Z

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of nightly CI trace publishing by accurately identifying and reporting GitHub API permission errors, which were previously masked as rate limit issues. This ensures that critical permission misconfigurations are no longer silently ignored, leading to clearer debugging. Additionally, a new tool has been integrated to provide insights into GitHub Actions runner utilization, aiding in resource management and optimization. A minor adjustment to a CUDA CI test suite is also included.

Highlights

Enhanced Error Handling for GitHub API: Improved the publish_traces.py script to accurately distinguish between GitHub API rate limit errors and permission errors (both previously treated as 403s). Permission errors will now cause the CI job to fail loudly with a clear message, instead of being silently skipped.
New Runner Utilization Report Script: Added a new Python script, runner_utilization_report.py, which analyzes GitHub Actions job data to calculate and report runner utilization metrics (active time, idle time, utilization percentage) per runner label. This report can be output in Markdown format.
CI Test Suite Update: Modified test_eagle_infer_beta.py to update the CUDA CI test suite from 'stage-b-test-small-1-gpu' to 'stage-b-test-large-1-gpu'.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/runner-utilization.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the issue of distinguishing between rate limit and permission errors in the trace publishing script. The changes in publish_traces.py are clear and correctly implement the desired behavior.

However, this PR also includes two unrelated changes: the addition of a new runner_utilization_report.py script and a modification to a test suite name in test_eagle_infer_beta.py. For better repository history and more focused reviews, it would be ideal to submit these changes in separate pull requests. The new runner utilization script, in particular, is a significant feature that deserves its own PR with a detailed description.

Despite this, I have reviewed all the changes. My feedback includes a suggestion to refactor some duplicated code in publish_traces.py and several improvements for the new runner_utilization_report.py script related to type hinting and exception handling.

gemini-code-assist · 2026-01-16T23:33:45Z

scripts/ci/publish_traces.py

+        if is_permission_error(e):
+            print(
+                f"ERROR: Token does not have write permission to {repo_owner}/{repo_name}. "
+                "Please update the GH_PAT_FOR_NIGHTLY_CI_DATA secret with a token that has "
+                "'contents: write' permission for the repository."
+            )
+            sys.exit(1)


This error handling logic for permission errors is duplicated at lines 453-459. To avoid code duplication and improve maintainability, consider extracting this block into a dedicated helper function. You could create a function like _handle_permission_error(repo_owner, repo_name) that prints the message and exits, then call it from both locations.

gemini-code-assist · 2026-01-16T23:33:45Z

scripts/ci/runner_utilization_report.py

+        text=True,
+    )
+    if result.returncode != 0:
+        raise Exception(f"gh api failed: {result.stderr}")


Raising a generic Exception is generally discouraged as it can catch unexpected errors (e.g., KeyboardInterrupt). It's better to define and raise a more specific custom exception, like RuntimeError or a custom GhApiError, to allow for more precise error handling by the calling functions.

Suggested change

raise Exception(f"gh api failed: {result.stderr}")

raise RuntimeError(f"gh api failed: {result.stderr}")

gemini-code-assist · 2026-01-16T23:33:45Z

scripts/ci/runner_utilization_report.py

+        return []
+
+
+def parse_time(time_str: str) -> datetime:


This function can return None, but the type hint only specifies datetime. To make it accurate, you should indicate that None is a possible return value. Using datetime | None is a good way to do this in modern Python.

Suggested change

def parse_time(time_str: str) -> datetime:

def parse_time(time_str: str) -> datetime | None:

gemini-code-assist · 2026-01-16T23:33:45Z

scripts/ci/runner_utilization_report.py

+}
+
+
+def calculate_utilization(repo: str, hours: int = 24, runner_filter: str = None):


This function returns a list of dictionaries containing the utilization results, but it's missing a return type hint. Adding a type hint improves code clarity and helps static analysis tools.

Suggested change

def calculate_utilization(repo: str, hours: int = 24, runner_filter: str = None):

def calculate_utilization(repo: str, hours: int = 24, runner_filter: str = None) -> list[dict]:

…e publishing Previously, all 403 errors were treated as rate limit errors and silently skipped. This masked permission errors when the GH_PAT_FOR_NIGHTLY_CI_DATA token lacks write access to the sglang-ci-data repository. Changes: - Update is_rate_limit_error() to check error message for rate limit phrases - Add is_permission_error() to detect permission-related 403 errors - Fail with clear error message when permission errors are detected Example error that was being silently skipped: {"message":"Resource not accessible by personal access token","status":"403"}

* fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256) * [diffusion] feat: support default 4-step inference for Flux2-Klein distilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com> * Add runner utilization report workflow (sgl-project#17234) * cli: support sglang version (sgl-project#17250) * Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261) * [VLM][Reland] Refactor load_mm_data to improve performance (sgl-project#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * [Tiny] Improve docs (sgl-project#17264) * [diffusion] fix: set guidance_scale default to None (sgl-project#17182) * Tiny fix comment typo (sgl-project#17287) * [SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Acclen Fix for DP under cudagraph mode (sgl-project#16974) * Add kl test for swa radix cache (sgl-project#17281) * fix: Handle multiple named chat templates in HuggingFace tokenizers (sgl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> * Move radix cache related tests (sgl-project#17295) * [Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> * [Bugfix] Fix PD accuracy when MTP is not configured on the prefill node (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com> * [Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296) * [Refactor] Split out deepseek v2 weight loader function into mixin (sgl-project#16649) * [NPU]Support GPT-OSS for NPU (sgl-project#14197) * [jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> * [GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235) Co-authored-by: root <root@ubuntu-nvidia.localdomain> * Update CODEOWNERS for multimodal_gen (sgl-project#17308) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * [Feature] overlap LoRA weight loading with compute (sgl-project#15512) * [PD] Optimize MHA models pp util calculation logic (sgl-project#17306) * [Minor] Correct sglang version when installing from source (sgl-project#17315) * Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fused_gate` (sgl-project#15347) * [DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (sgl-project#16961) * Update code sync scripts (sgl-project#17319) * [Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * support new qwen3_coder_detector (sgl-project#16744) Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> * Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325) * KV Cache Events with Attention DP bug fix (sgl-project#16030) (sgl-project#16412) * [Perf] fuse q, k norm for Flux2Attention (sgl-project#17241) Co-authored-by: Minglei Zhu <zminglei@linkedin.com> * [CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#17245) * fix(ci): rate limit and permission errors in trace publishing (sgl-project#17238) * Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (sgl-project#17332) * Migrate performance, accuracy, and quantization tests to CI registry (sgl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> * Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158) * [Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (sgl-project#17309) * [Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pass kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix v32 continue_final_message not work (sgl-project#16567) * Evict swa kv cache during decoding (sgl-project#17220) * [RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-project#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> * [AMD CI] Migrate and Add More Testcases (sgl-project#17116) Co-authored-by: yctseng0211 <yctseng@amd.com> * [AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-project#17345) * Restore deepseek_v2.py to main's code, except the utils * Ran `pre-commit` --------- Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Hudson Xing <1277646412@qq.com> Co-authored-by: Lancer <402430575@qq.com> Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu> Co-authored-by: Changyi Yang <112288487+ChangyiYang@users.noreply.github.com> Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca> Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Ch3ngY1 <91232537+Ch3ngY1@users.noreply.github.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Jerry Ji <jerryjilol@gmail.com> Co-authored-by: Todobe <43903496+Todobe@users.noreply.github.com> Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com> Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> Co-authored-by: Koushik Dutta <koush@koushikdutta.com> Co-authored-by: root <root@ubuntu-nvidia.localdomain> Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Lee Nau <lnau@nvidia.com> Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Gaoji Liu <34803073+attack204@users.noreply.github.com> Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com> Co-authored-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Minglei Zhu <zminglei@linkedin.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Shu Wang <shuw@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: zhangheng <hzh0425@apache.org> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>

alisonshao requested review from Fridge003, Kangyan-Zhou, ispobock and merrymercy as code owners January 16, 2026 23:32

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

alisonshao force-pushed the fix/nightly-trace-publish-permission-error branch from 614e5af to eb14171 Compare January 16, 2026 23:47

alisonshao force-pushed the fix/nightly-trace-publish-permission-error branch from eb14171 to ed4d236 Compare January 16, 2026 23:48

alisonshao changed the title ~~fix(ci): distinguish between rate limit and permission errors in trace publishing~~ fix(ci): rate limit and permission errors in trace publishing Jan 17, 2026

Kangyan-Zhou merged commit fb88fb6 into main Jan 19, 2026
72 of 80 checks passed

Kangyan-Zhou deleted the fix/nightly-trace-publish-permission-error branch January 19, 2026 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): rate limit and permission errors in trace publishing#17238

fix(ci): rate limit and permission errors in trace publishing#17238
Kangyan-Zhou merged 1 commit intomainfrom
fix/nightly-trace-publish-permission-error

alisonshao commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise Exception(f"gh api failed: {result.stderr}")
	raise RuntimeError(f"gh api failed: {result.stderr}")

	def parse_time(time_str: str) -> datetime:
	def parse_time(time_str: str) -> datetime \| None:

		}


		def calculate_utilization(repo: str, hours: int = 24, runner_filter: str = None):

Conversation

alisonshao commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants