[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x by michaelzhang-ai · Pull Request #21710 · sgl-project/sglang

michaelzhang-ai · 2026-03-30T21:47:24Z

Summary

Add GLM-5-FP8 nightly perf benchmarks (bench_one_batch) for MI30x and MI35x. Both accuracy and perf use zai-org/GLM-5-FP8 with NSA tilelang backend, TP=8, FP8 KV cache, and --reasoning-parser=glm45 --tool-call-parser=glm47 (matching NV/InferenceX configs).

Changes

New: test/registered/amd/perf/mi30x/test_glm5_perf_amd.py (suite: nightly-perf-8-gpu-glm5)
New: test/registered/amd/perf/mi35x/test_glm5_perf_mi35x.py (suite: nightly-perf-8-gpu-mi35x-glm5)
Modified: accuracy tests in mi30x/ and mi35x/ — switch model to zai-org/GLM-5-FP8, add parser flags
Modified: nightly-test-amd.yml and nightly-test-amd-rocm720.yml — add perf step after accuracy in each GLM-5 job

Workflow behavior

Accuracy step has no continue-on-error — if it fails, perf is skipped and the job fails
Perf step has continue-on-error: true — perf failures don't block CI

Dependencies

[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 #22314: Fix MI300 FP8 KV quant path dispatch (enables --kv-cache-dtype fp8_e4m3 on MI30x)
Reduce unnecessary kernels and copies in the NSA indexer #22232: Fix NSA indexer GPU memory fault at batch_size=64 (fixes MI35x perf crash)

Server config

	MI30x	MI35x
`--kv-cache-dtype`	fp8_e4m3	fp8_e4m3
`--mem-fraction-static`	0.85	0.85
`--model-loader-extra-config`	`{"enable_multithread_load": true}`	`{"enable_multithread_load": true, "num_threads": 8}`
Env	`SGLANG_USE_AITER=1`	`SGLANG_ROCM_FUSED_DECODE_MLA=0`, `ROCM_QUICK_REDUCE_QUANTIZATION=INT4`, `SAFETENSORS_FAST_GPU=1`

CI validation

Nightly Test (AMD): https://github.com/sgl-project/sglang/actions/runs/24118470025
Nightly Test (AMD ROCm 7.2): https://github.com/sgl-project/sglang/actions/runs/24118470693

MI30x perf results (earlier run, without FP8 KV cache)

From AMD run / ROCm 7.2 run:

batch	ISL	latency (s)	input tok/s	output tok/s	ITL (ms)
1	4096	25.50	1735	22.1	45.2
8	4096	33.76	5172	149.4	53.6
16	4096	40.23	6313	274.5	58.3
64	4096	85.64	6351	738.7	86.6

gemini-code-assist

Code Review

This pull request introduces nightly performance benchmarks for the GLM-5 model on AMD MI30x and MI35x platforms. The reviewer identified several areas for improvement, including refactoring duplicated report generation logic into a shared utility, fixing potential division-by-zero errors in throughput calculations, and enhancing test portability by avoiding hardcoded local paths. Additionally, it was noted that certain environment variables should be consistently applied across different GPU configurations to ensure optimal performance.

test/registered/amd/perf/mi30x/test_glm5_perf_amd.py

test/registered/amd/perf/mi35x/test_glm5_perf_mi35x.py

gemini-code-assist · 2026-03-30T21:48:52Z

test/registered/amd/perf/mi35x/test_glm5_perf_mi35x.py

+                "--watchdog-timeout",
+                "1200",
+            ],
+            "env_vars": {},


SGLANG_USE_AITER is missing from the environment variables for MI35x, whereas it is enabled for MI30x. Since both use the same model and attention backend (tilelang), this might be an oversight that could lead to suboptimal performance results on MI35x.

"env_vars": { "SGLANG_USE_AITER": "1", },

1am9trash · 2026-04-01T10:59:30Z

Maybe we can add --reasoning-parser=glm45 --tool-call-parser=glm47 to the GLM-5-FP8 AMD test configs for consistency.
These parsers are already used in NV unit tests and InferenceX tests, so aligning AMD test settings would reduce cross-platform behavior drift.

NV cmd
InferenceX cmd

…MI35x Add bench_one_batch perf tests for GLM-5-FP8 with NSA attention backend, running after accuracy tests in the same CI job. Perf failures do not block CI when accuracy passes (continue-on-error: true). - Use zai-org/GLM-5-FP8 for both accuracy and perf tests - Add --reasoning-parser=glm45 --tool-call-parser=glm47 for consistency with NV tests and InferenceX benchmarks - Enable --kv-cache-dtype fp8_e4m3 in perf tests for FP8 KV cache - MI35x perf uses env tuning from InferenceX and PR #21511

Now that #22314 (MI300 FP8 KV quant dispatch fix) and #22232 (NSA indexer clone fix) are merged, re-enable FP8 KV cache for both MI30x and MI35x perf tests.

sgl-project#21710)

michaelzhang-ai requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners March 30, 2026 21:47

github-actions bot added the amd label Mar 30, 2026

gemini-code-assist bot reviewed Mar 30, 2026

View reviewed changes

michaelzhang-ai requested review from 1am9trash, hubertlu-tw, kkHuang-amd and yichiche as code owners March 31, 2026 18:52

michaelzhang-ai changed the title ~~[AMD][CI] Add GLM-5 nightly performance benchmarks for MI30x and MI35x~~ [AMD][CI] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x Mar 31, 2026

michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 4db8b49 to 9141485 Compare March 31, 2026 23:03

michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 50c89f2 to f64c7af Compare April 1, 2026 19:03

1am9trash approved these changes Apr 2, 2026

View reviewed changes

michaelzhang-ai requested a review from HaiShaw April 2, 2026 06:35

michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch 7 times, most recently from 8a360a8 to b9ec6b9 Compare April 7, 2026 20:40

1am9trash mentioned this pull request Apr 8, 2026

[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 #22314

Merged

5 tasks

michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 2a78aa5 to 3815bea Compare April 8, 2026 04:52

[AMD][CI] Enable --kv-cache-dtype fp8_e4m3 for GLM-5 perf tests

e1cb96b

Now that #22314 (MI300 FP8 KV quant dispatch fix) and #22232 (NSA indexer clone fix) are merged, re-enable FP8 KV cache for both MI30x and MI35x perf tests.

michaelzhang-ai changed the title ~~[AMD][CI] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x~~ [AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x Apr 8, 2026

HaiShaw approved these changes Apr 8, 2026

View reviewed changes

HaiShaw merged commit db60a62 into main Apr 8, 2026
126 of 136 checks passed

HaiShaw deleted the add-glm5-nightly-perf-test branch April 8, 2026 05:43

michaelzhang-ai mentioned this pull request Apr 8, 2026

[AMD] Add GLM-5.1-FP8 nightly accuracy and performance benchmarks for MI30x and MI35x #22336

Merged

4 tasks

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 8, 2026

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x (

c33b333

sgl-project#21710)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x#21710

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x#21710
HaiShaw merged 2 commits intomainfrom
add-glm5-nightly-perf-test

michaelzhang-ai commented Mar 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 30, 2026

Uh oh!

1am9trash commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelzhang-ai commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Workflow behavior

Dependencies

Server config

CI validation

MI30x perf results (earlier run, without FP8 KV cache)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

1am9trash commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michaelzhang-ai commented Mar 30, 2026 •

edited

Loading