[CI] Lower mem-fraction-static for GLM-5.1 FP8 8-GPU test to 0.85 by Jiminator · Pull Request #25453 · sgl-project/sglang

Jiminator · 2026-05-16T02:02:12Z

Motivation

test/registered/8-gpu-models/test_glm_51_fp8.py::test_glm51_fp8 has been failing on every B200 nightly since the test was added in #22399 (2026-04-09, ~37 days, 0 passes on B200). The TP8+DP8 variant OOMs at scheduler init; TP8 and TP8+DP8+MTP are fine. The same test passes on H200 in the same scheduled runs.

Example failure: run 25835354140 / job 75909128349.

--mem-fraction-static=0.9 on B200 (178 GiB/GPU) leaves only ~3 GiB free after the DP-attention workspaces and cuda-graph capture intermediates land; the next 6.38 GiB cuda-graph alloc OOMs. Not a code regression — the B200 runner has ~5 GiB higher baseline residency than H200 for this model.

Modifications

-    "--mem-fraction-static=0.9",
+    "--mem-fraction-static=0.85",

0.85 is already used by test_minimax_m25.py for the same 8-GPU + DP-attention shape; every other 8-GPU + DP-attention test in this dir uses 0.85 or lower. GB300 tests keep 0.9 (different runner, 288 GiB/GPU).

Verification (8x B200, local repro)

Variant	0.9 (current)	0.85 (this PR)
`TP8`	PASS, gsm8k 0.966	PASS, gsm8k 0.967
`TP8+DP8`	FAIL: 6.38 GiB OOM, scheduler died	PASS, gsm8k 0.967
`TP8+DP8+MTP`	PASS	PASS, gsm8k 0.965
Overall	FAILED	ALL TESTS PASSED

Local OOM signature at 0.9 matches CI exactly (6.38 GiB request, ~3.3 GiB free, 170.07 GiB resident).

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

The TP8+DP8 variant has been OOMing at scheduler init on the B200 nightly runner since the test was added (sgl-project#22399, 2026-04-09), failing every nightly partition for ~37 days. At mem-fraction-static=0.9 the B200's actual peak residency during DP-attention workspace allocation hits ~170 GiB of the 178 GiB total, leaving only ~3 GiB free; a subsequent 6.38 GiB cuda-graph workspace allocation then OOMs. The same test passes on H200 because its baseline residency leaves more headroom. Reproduced locally on 8x B200 (b200-novita-1 class): 0.90 -> Model 2 (TP8+DP8) crashes with the exact CI OOM signature (6.38 GiB requested, ~3.3 GiB free, 170 GiB resident) 0.85 -> all three variants (TP8, TP8+DP8, TP8+DP8+MTP) pass Note: 0.85 is already the value used by test_minimax_m25.py for the same 8-GPU + DP-attention shape; every other 8-GPU + DP-attention test in this directory uses 0.85 or lower. GLM-5.1 FP8 at 0.9 was the outlier. The GB300 variants (test_glm5_fp8.py, test_glm5_nvfp4.py) keep 0.9 - that runner has 288 GiB/GPU and the same fraction yields more absolute headroom.

gemini-code-assist · 2026-05-16T02:02:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-05-16T02:29:18Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Jiminator · 2026-05-16T02:29:29Z

/tag-and-rerun-ci

…5453)

Jiminator marked this pull request as ready for review May 16, 2026 02:29

Jiminator self-assigned this May 16, 2026

Jiminator requested a review from Kangyan-Zhou May 16, 2026 02:29

github-actions Bot added the run-ci label May 16, 2026

Jiminator requested review from Fridge003 and b8zhong May 16, 2026 02:30

b8zhong approved these changes May 16, 2026

View reviewed changes

Kangyan-Zhou merged commit a741d0c into sgl-project:main May 16, 2026
142 of 184 checks passed

Fridge003 pushed a commit that referenced this pull request May 16, 2026

[CI] Lower mem-fraction-static for GLM-5.1 FP8 8-GPU test to 0.85 (#2…

2e5b65b

…5453)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Lower mem-fraction-static for GLM-5.1 FP8 8-GPU test to 0.85#25453

[CI] Lower mem-fraction-static for GLM-5.1 FP8 8-GPU test to 0.85#25453
Kangyan-Zhou merged 1 commit into
sgl-project:mainfrom
Jiminator:fix/glm5-fp8-tp8dp8-mem-fraction

Jiminator commented May 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 16, 2026

Uh oh!

gemini-code-assist Bot commented May 16, 2026

Uh oh!

Jiminator commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jiminator commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verification (8x B200, local repro)

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 16, 2026

Uh oh!

gemini-code-assist Bot commented May 16, 2026

Uh oh!

Jiminator commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jiminator commented May 16, 2026 •

edited

Loading