Skip to content

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x#21710

Merged
HaiShaw merged 2 commits intomainfrom
add-glm5-nightly-perf-test
Apr 8, 2026
Merged

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x#21710
HaiShaw merged 2 commits intomainfrom
add-glm5-nightly-perf-test

Conversation

@michaelzhang-ai
Copy link
Copy Markdown
Collaborator

@michaelzhang-ai michaelzhang-ai commented Mar 30, 2026

Summary

Add GLM-5-FP8 nightly perf benchmarks (bench_one_batch) for MI30x and MI35x. Both accuracy and perf use zai-org/GLM-5-FP8 with NSA tilelang backend, TP=8, FP8 KV cache, and --reasoning-parser=glm45 --tool-call-parser=glm47 (matching NV/InferenceX configs).

Changes

  • New: test/registered/amd/perf/mi30x/test_glm5_perf_amd.py (suite: nightly-perf-8-gpu-glm5)
  • New: test/registered/amd/perf/mi35x/test_glm5_perf_mi35x.py (suite: nightly-perf-8-gpu-mi35x-glm5)
  • Modified: accuracy tests in mi30x/ and mi35x/ — switch model to zai-org/GLM-5-FP8, add parser flags
  • Modified: nightly-test-amd.yml and nightly-test-amd-rocm720.yml — add perf step after accuracy in each GLM-5 job

Workflow behavior

  • Accuracy step has no continue-on-error — if it fails, perf is skipped and the job fails
  • Perf step has continue-on-error: true — perf failures don't block CI

Dependencies

Server config

MI30x MI35x
--kv-cache-dtype fp8_e4m3 fp8_e4m3
--mem-fraction-static 0.85 0.85
--model-loader-extra-config {"enable_multithread_load": true} {"enable_multithread_load": true, "num_threads": 8}
Env SGLANG_USE_AITER=1 SGLANG_ROCM_FUSED_DECODE_MLA=0, ROCM_QUICK_REDUCE_QUANTIZATION=INT4, SAFETENSORS_FAST_GPU=1

CI validation

MI30x perf results (earlier run, without FP8 KV cache)

From AMD run / ROCm 7.2 run:

batch ISL latency (s) input tok/s output tok/s ITL (ms)
1 4096 25.50 1735 22.1 45.2
8 4096 33.76 5172 149.4 53.6
16 4096 40.23 6313 274.5 58.3
64 4096 85.64 6351 738.7 86.6

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces nightly performance benchmarks for the GLM-5 model on AMD MI30x and MI35x platforms. The reviewer identified several areas for improvement, including refactoring duplicated report generation logic into a shared utility, fixing potential division-by-zero errors in throughput calculations, and enhancing test portability by avoiding hardcoded local paths. Additionally, it was noted that certain environment variables should be consistently applied across different GPU configurations to ensure optimal performance.

"--watchdog-timeout",
"1200",
],
"env_vars": {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

SGLANG_USE_AITER is missing from the environment variables for MI35x, whereas it is enabled for MI30x. Since both use the same model and attention backend (tilelang), this might be an oversight that could lead to suboptimal performance results on MI35x.

            "env_vars": {
                "SGLANG_USE_AITER": "1",
            },

@michaelzhang-ai michaelzhang-ai changed the title [AMD][CI] Add GLM-5 nightly performance benchmarks for MI30x and MI35x [AMD][CI] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x Mar 31, 2026
@michaelzhang-ai michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 4db8b49 to 9141485 Compare March 31, 2026 23:03
@1am9trash
Copy link
Copy Markdown
Collaborator

Maybe we can add --reasoning-parser=glm45 --tool-call-parser=glm47 to the GLM-5-FP8 AMD test configs for consistency.
These parsers are already used in NV unit tests and InferenceX tests, so aligning AMD test settings would reduce cross-platform behavior drift.

NV cmd
InferenceX cmd

@michaelzhang-ai michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 50c89f2 to f64c7af Compare April 1, 2026 19:03
@michaelzhang-ai michaelzhang-ai requested a review from HaiShaw April 2, 2026 06:35
@michaelzhang-ai michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch 7 times, most recently from 8a360a8 to b9ec6b9 Compare April 7, 2026 20:40
…MI35x

Add bench_one_batch perf tests for GLM-5-FP8 with NSA attention backend,
running after accuracy tests in the same CI job. Perf failures do not
block CI when accuracy passes (continue-on-error: true).

- Use zai-org/GLM-5-FP8 for both accuracy and perf tests
- Add --reasoning-parser=glm45 --tool-call-parser=glm47 for consistency
  with NV tests and InferenceX benchmarks
- Enable --kv-cache-dtype fp8_e4m3 in perf tests for FP8 KV cache
- MI35x perf uses env tuning from InferenceX and PR #21511
@michaelzhang-ai michaelzhang-ai force-pushed the add-glm5-nightly-perf-test branch from 2a78aa5 to 3815bea Compare April 8, 2026 04:52
Now that #22314 (MI300 FP8 KV quant dispatch fix) and #22232 (NSA
indexer clone fix) are merged, re-enable FP8 KV cache for both
MI30x and MI35x perf tests.
@michaelzhang-ai michaelzhang-ai changed the title [AMD][CI] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x [AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x Apr 8, 2026
@HaiShaw HaiShaw merged commit db60a62 into main Apr 8, 2026
126 of 136 checks passed
@HaiShaw HaiShaw deleted the add-glm5-nightly-perf-test branch April 8, 2026 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants