Skip to content

Add runner utilization report workflow#17234

Merged
Kangyan-Zhou merged 10 commits intomainfrom
ci/runner-utilization-report
Jan 18, 2026
Merged

Add runner utilization report workflow#17234
Kangyan-Zhou merged 10 commits intomainfrom
ci/runner-utilization-report

Conversation

@alisonshao
Copy link
Collaborator

Tracks idle/active time per runner label to understand machine utilization. Helps identify when to add more runners (high utilization) or when runners are over-provisioned (low utilization).

Runs daily at 8 AM UTC. Can also trigger manually with custom time window and label filter.

Example output:

Label Runners Jobs Active (hrs) Idle (hrs) Utilization
1-gpu-5090 16 64 12.5 371.5 3.3% ░░░░░░░░░░
1-gpu-h200 8 156 89.2 102.8 46.5% ████░░░░░░

Tracks idle/active time per runner label to help understand machine
utilization. Runs daily and can be triggered manually with custom
time window and label filter.

Reports utilization percentage, active hours, and idle hours per
runner label (e.g., 1-gpu-5090, 1-gpu-h200).
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new tool to provide critical insights into the performance and cost-efficiency of GitHub Actions self-hosted runners. By automatically generating detailed utilization reports, it empowers teams to make informed decisions about scaling their runner infrastructure, ensuring optimal resource allocation and reducing unnecessary operational costs.

Highlights

  • New Runner Utilization Script: Introduces a new Python script, runner_utilization_report.py, designed to analyze GitHub Actions job data and calculate self-hosted runner utilization metrics.
  • Utilization Metrics: The script reports active time, idle time, and utilization percentage for runners, grouped by their custom labels. It helps identify over-provisioned or under-provisioned runner pools.
  • GitHub API Integration: Leverages the gh CLI to interact with the GitHub API, fetching workflow runs, jobs, and runner details within a specified time window.
  • Markdown Report Generation: Generates a formatted Markdown report that includes a summary table and interpretation guidelines for high and low utilization, and can output to stdout, a file, or GITHUB_STEP_SUMMARY.
  • Configurable Parameters: Supports command-line arguments for specifying the GitHub repository, time window (in hours), and filtering runner labels.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/runner-utilization.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful script for monitoring runner utilization. The implementation is solid, but I've identified several areas for improvement, primarily concerning the handling of API pagination, which could lead to incorrect reports. I've also noted some opportunities for performance optimization and code cleanup. Addressing these points will make the script more robust and reliable.

Comment on lines +73 to +74
data = run_gh_command([f"repos/{repo}/actions/runners", "-f", "per_page=100"])
return data.get("runners", [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The get_runners function does not handle pagination. It only fetches the first 100 runners because of the per_page=100 parameter. If the repository has more than 100 self-hosted runners, the report will be incomplete and inaccurate because it will not account for all runners. You should implement a pagination loop, similar to the one in get_workflow_runs, to fetch all runners.

Suggested change
data = run_gh_command([f"repos/{repo}/actions/runners", "-f", "per_page=100"])
return data.get("runners", [])
all_runners = []
page = 1
while True:
data = run_gh_command([
f"repos/{repo}/actions/runners",
"-f", "per_page=100",
"-f", f"page={page}",
])
runners = data.get("runners", [])
all_runners.extend(runners)
if len(runners) < 100:
break
page += 1
return all_runners

Comment on lines +47 to +48
if page > 10: # Safety limit
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The while loop for fetching workflow runs has a hardcoded safety limit of 10 pages. For a very active repository, there could be more than 1000 workflow runs (10 pages * 100 per page) in the specified time window. If this limit is reached, the script will silently stop fetching runs, leading to incomplete data and underestimated utilization metrics. It's safer to remove this limit and rely on the API's pagination mechanism to determine when to stop.

Comment on lines +66 to +67
if page > 5: # Safety limit
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to get_workflow_runs, this function has a hardcoded safety limit of 5 pages for fetching jobs. A workflow run with a large matrix strategy could easily exceed 500 jobs (5 pages * 100 per page). Reaching this limit will cause the script to under-report active time for that run. It's better to remove this arbitrary limit and let the pagination loop complete naturally based on the number of results returned by the API.

Comment on lines +141 to +143
for label, runner_list in label_runners.items():
if runner_name in runner_list:
label_jobs[label].append(job_info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current approach to map a job's runner to a label involves iterating through all labels for every job. This has a complexity of O(Jobs * Labels) and can be inefficient for a large number of jobs and labels. A more performant approach is to create a reverse mapping from runner name to label once, before processing the jobs. This changes the lookup to an O(1) operation inside the loop.

First, create this map after label_runners is finalized (around line 108):

runner_to_label_map = {
    runner_name: label
    for label, runner_list in label_runners.items()
    for runner_name in runner_list
}

Then, you can replace this loop with a more efficient dictionary lookup as suggested.

            if label := runner_to_label_map.get(runner_name):
                label_jobs[label].append(job_info)

for runner in runners:
for label in runner.get("labels", []):
label_name = label.get("name", "")
if label_name not in ["self-hosted", "Linux", "X64"]: # Skip default labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of default labels to ignore ("self-hosted", "Linux", "X64") is hardcoded. This could be brittle if GitHub changes these default labels in the future. For better maintainability, consider moving this list to a constant defined at the top of the file.

For example, you could add this at the top of the script:

DEFAULT_LABELS_TO_IGNORE = {"self-hosted", "Linux", "X64"}

And then use the constant here.

Suggested change
if label_name not in ["self-hosted", "Linux", "X64"]: # Skip default labels
if label_name not in DEFAULT_LABELS_TO_IGNORE: # Skip default labels

print(f"Tracking {len(label_runners)} runner labels: {list(label_runners.keys())}")

# Collect job data per runner
runner_jobs = defaultdict(list) # runner_name -> list of (start, end, job_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The runner_jobs dictionary is initialized here and populated on line 138, but its value is never read or used. This is dead code and should be removed to improve clarity. You should also remove line 138 where it is populated.

duration = (completed_at - started_at).total_seconds()
job_info = (started_at, completed_at, duration, job["name"], runner_name)

runner_jobs[runner_name].append(job_info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line populates the runner_jobs dictionary, which is unused. This line should be removed along with the dictionary's initialization on line 110.


# Calculate metrics per label
now = datetime.now(timezone.utc)
window_start = now - timedelta(hours=hours)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The window_start variable is calculated but never used elsewhere in the code. This unused variable should be removed to improve code clarity.

- Add pagination to get_runners() to handle >100 runners
- Filter offline runners with online_only parameter
- Use job labels from API when available, fall back to observed
- Move default labels to constants
- Remove unused variables (runner_jobs, now, runners_source)
- Remove unused infer_runner_label function
- Fix lint issues (f-strings, ambiguous variable names)
@alisonshao
Copy link
Collaborator Author

alisonshao commented Jan 16, 2026

@alisonshao
Copy link
Collaborator Author

@Kangyan-Zhou Kangyan-Zhou merged commit 7edb061 into main Jan 18, 2026
63 of 69 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the ci/runner-utilization-report branch January 18, 2026 03:28
DotSlash-A pushed a commit to DotSlash-A/sglang that referenced this pull request Jan 19, 2026
* fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256)

* [diffusion] feat: support default 4-step inference for Flux2-Klein distilled models (sgl-project#17225)

Signed-off-by: Lancer <maruixiang6688@gmail.com>

* Add runner utilization report workflow (sgl-project#17234)

* cli: support sglang version (sgl-project#17250)

* Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261)

* [VLM][Reland] Refactor load_mm_data to improve performance (sgl-project#16152)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Tiny] Improve docs (sgl-project#17264)

* [diffusion] fix: set guidance_scale default to None (sgl-project#17182)

* Tiny fix comment typo (sgl-project#17287)

* [SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Acclen Fix for DP under cudagraph mode (sgl-project#16974)

* Add kl test for swa radix cache (sgl-project#17281)

* fix: Handle multiple named chat templates in HuggingFace tokenizers (sgl-project#17236)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* Move radix cache related tests (sgl-project#17295)

* [Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_GEMM_BACKEND` (sgl-project#16534)

Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

* [Bugfix] Fix PD accuracy when MTP is not configured on the prefill node (sgl-project#17212)

Co-authored-by: Shangming Cai <csmthu@gmail.com>

* [Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296)

* [Refactor] Split out deepseek v2 weight loader function into mixin (sgl-project#16649)

* [NPU]Support GPT-OSS for NPU (sgl-project#14197)

* [jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631)

Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>

* [GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235)

Co-authored-by: root <root@ubuntu-nvidia.localdomain>

* Update CODEOWNERS for multimodal_gen (sgl-project#17308)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* [Feature] overlap LoRA weight loading with compute (sgl-project#15512)

* [PD] Optimize MHA models pp util calculation logic (sgl-project#17306)

* [Minor] Correct sglang version when installing from source (sgl-project#17315)

* Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fused_gate` (sgl-project#15347)

* [DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (sgl-project#16961)

* Update code sync scripts (sgl-project#17319)

* [Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* support new qwen3_coder_detector (sgl-project#16744)

Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>

* Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325)

* KV Cache Events with Attention DP bug fix (sgl-project#16030) (sgl-project#16412)

* [Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)

Co-authored-by: Minglei Zhu <zminglei@linkedin.com>

* [CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#17245)

* fix(ci): rate limit and permission errors in trace publishing (sgl-project#17238)

* Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (sgl-project#17332)

* Migrate performance, accuracy, and quantization tests to CI registry (sgl-project#17177)

Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

* Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158)

* [Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (sgl-project#17309)

* [Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pass kernel(without residual) (sgl-project#17305)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix v32 continue_final_message not work (sgl-project#16567)

* Evict swa kv cache during decoding (sgl-project#17220)

* [RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-project#17142)

Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>

* [AMD CI] Migrate and Add More Testcases (sgl-project#17116)

Co-authored-by: yctseng0211 <yctseng@amd.com>

* [AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-project#17345)

* Restore deepseek_v2.py to main's code, except the utils

* Ran `pre-commit`

---------

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Hudson Xing <1277646412@qq.com>
Co-authored-by: Lancer <402430575@qq.com>
Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
Co-authored-by: Changyi Yang <112288487+ChangyiYang@users.noreply.github.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Ch3ngY1 <91232537+Ch3ngY1@users.noreply.github.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Jerry Ji <jerryjilol@gmail.com>
Co-authored-by: Todobe <43903496+Todobe@users.noreply.github.com>
Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com>
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
Co-authored-by: Koushik Dutta <koush@koushikdutta.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Lee Nau <lnau@nvidia.com>
Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gaoji Liu <34803073+attack204@users.noreply.github.com>
Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: Kartik Ramesh <kartikx2000@gmail.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Minglei Zhu <zminglei@linkedin.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: zhangheng <hzh0425@apache.org>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants