[AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x by michaelzhang-ai · Pull Request #21669 · sgl-project/sglang

michaelzhang-ai · 2026-03-30T07:02:31Z

Summary

Add bench_one_batch performance tests for Qwen3.5-397B-A17B-FP8 on both MI325/MI300X and MI35x GPUs
Perf steps run after existing Qwen3.5 accuracy tests in the same CI job, with continue-on-error: true so perf failures don't block CI when accuracy passes
Updated all 4 workflow locations: default ROCm + ROCm 7.2 × MI30x + MI35x
Write Qwen3.5 lm-eval accuracy results to GitHub step summary (same pattern as MXFP4 tests)

Changes

New test files

GPU	File	Suite
MI30x	`test/registered/amd/perf/mi30x/test_qwen35_fp8_perf_amd.py`	`nightly-perf-8-gpu-qwen35-fp8`
MI35x	`test/registered/amd/perf/mi35x/test_qwen35_fp8_perf_mi35x.py`	`nightly-perf-8-gpu-mi35x-qwen35-fp8`

Server configuration (matches InferenceX benchmarks)

Model: Qwen/Qwen3.5-397B-A17B-FP8 (pre-quantized FP8 checkpoint)
--attention-backend aiter
--tp 8, --mem-fraction-static 0.8
--model-loader-extra-config '{"enable_multithread_load": true}'
--watchdog-timeout 1200
SGLANG_USE_AITER=1

Workflow updates

.github/workflows/nightly-test-amd.yml: Added perf steps to nightly-8-gpu-qwen35 and nightly-8-gpu-mi35x-qwen35 jobs
.github/workflows/nightly-test-amd-rocm720.yml: Added perf steps to nightly-8-gpu-qwen35-rocm720 and nightly-8-gpu-mi35x-qwen35-rocm720 jobs

Accuracy summary fix

Override test_lm_eval in test_qwen35_eval_amd.py and test_qwen35_eval_mi35x.py to write lm-eval results table to GITHUB_STEP_SUMMARY (same pattern as test_qwen3_instruct_mxfp4.py). No common code changed.

Behavior

If accuracy fails → perf step is skipped (no continue-on-error on accuracy step)
If accuracy passes but perf fails → job still passes (continue-on-error: true on perf step)

CI validation

Run 3 — aiter attention backend (latest)

Default ROCm: Run #24066129104 — in progress
ROCm 7.2: Run #24066129577 — in progress

Run 2 — triton attention backend + accuracy summary fix ✅

Default ROCm: Run #24051381084
ROCm 7.2: Run #24051381558

Job	Duration	Accuracy	Performance
MI30x default ROCm	59m	✅	✅
MI35x default ROCm	47m	✅	✅
MI30x ROCm 7.2	60m	✅	✅
MI35x ROCm 7.2	47m	✅	✅

Run 1 — initial (perf only, triton backend)

Run #23761029529: MI30x 1h12m ✅, MI35x 33m ✅

Test plan

Verify YAML syntax is valid (done locally via yaml.safe_load)
Verify black, ruff, isort checks pass on all new/modified test files
Suite names match between register_amd_ci() calls and run_suite.py invocations
Run nightly on MI325 and MI35x — default ROCm ✅
Run nightly on MI325 and MI35x — ROCm 7.2 ✅
Verify accuracy results appear in step summary ✅
Verify aiter attention backend passes (Run 3)

…I35x Add bench_one_batch performance tests for Qwen3.5-397B-A17B-FP8 on both MI325/MI300X and MI35x GPUs. Perf steps run after existing accuracy tests with continue-on-error so perf failures don't block CI when accuracy passes. - New test files using triton attention backend, TP=8, mem-fraction 0.8 - Perf steps added to both default ROCm and ROCm 7.2 nightly workflows - Suite names: nightly-perf-8-gpu-qwen35-fp8, nightly-perf-8-gpu-mi35x-qwen35-fp8

gemini-code-assist

Code Review

This pull request introduces nightly performance benchmarks for the Qwen3.5-397B-A17B FP8 model on AMD MI30x and MI35x platforms. The reviewer identified significant code duplication between the benchmark scripts and suggested refactoring shared logic into a common module. Other feedback includes fixing a potential division-by-zero error in the ITL calculation and replacing hardcoded model paths with more portable environment-based configurations.

gemini-code-assist · 2026-03-30T07:04:17Z

+    )
+
+    for result in report_results:
+        itl = 1 / (result.output_throughput / result.batch_size) * 1000


The calculation for itl could lead to a ZeroDivisionError if result.output_throughput is zero. It's safer to check for this case to prevent the test from crashing during report generation. Rewriting the expression also improves readability.

Suggested change

itl = 1 / (result.output_throughput / result.batch_size) * 1000

itl = (result.batch_size / result.output_throughput) * 1000 if result.output_throughput > 0 else 0

gemini-code-assist · 2026-03-30T07:04:17Z

+os.environ.setdefault("HF_HOME", "/data2/models/huggingface")
+os.environ.setdefault("HF_HUB_CACHE", "/data2/models/huggingface/hub")


Hardcoding paths like /data2/models/huggingface makes the test less portable and dependent on a specific machine's setup. It's better to rely on the CI environment to set these environment variables, or use a more generic default that works in different environments (e.g., a path relative to the user's home directory).

gemini-code-assist · 2026-03-30T07:04:17Z

+    )
+
+    for result in report_results:
+        itl = 1 / (result.output_throughput / result.batch_size) * 1000


The calculation for itl could lead to a ZeroDivisionError if result.output_throughput is zero. It's safer to check for this case to prevent the test from crashing during report generation. Rewriting the expression also improves readability.

Suggested change

itl = 1 / (result.output_throughput / result.batch_size) * 1000

itl = (result.batch_size / result.output_throughput) * 1000 if result.output_throughput > 0 else 0

Override test_lm_eval in the Qwen3.5 accuracy tests to write a markdown results table to GITHUB_STEP_SUMMARY, matching the pattern used by the MXFP4 combined tests. No common code changed.

Jackycheng0808

We should use --attention-backend aiter instead.

yichiche

LGTM

sgl-project#21669)

michaelzhang-ai requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners March 30, 2026 07:02

github-actions Bot added the amd label Mar 30, 2026

gemini-code-assist Bot reviewed Mar 30, 2026

View reviewed changes

michaelzhang-ai requested review from HaiShaw and yctseng0211 March 30, 2026 23:33

michaelzhang-ai added 2 commits April 6, 2026 12:28

Merge branch 'main' into amd/qwen35-fp8-perf-test

016f96e

[AMD CI] Write Qwen3.5 accuracy results to GitHub step summary

9bef99f

Override test_lm_eval in the Qwen3.5 accuracy tests to write a markdown results table to GITHUB_STEP_SUMMARY, matching the pattern used by the MXFP4 combined tests. No common code changed.

michaelzhang-ai requested a review from yichiche April 7, 2026 02:56

michaelzhang-ai changed the title ~~[AMD CI] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x~~ [AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x Apr 7, 2026

Jackycheng0808 reviewed Apr 7, 2026

View reviewed changes

michaelzhang-ai added 2 commits April 7, 2026 00:34

[AMD CI] Switch Qwen3.5 FP8 perf tests to aiter attention backend

43f2e94

[AMD CI] Switch Qwen3.5 accuracy tests to aiter attention backend

c000b9d

yichiche approved these changes Apr 7, 2026

View reviewed changes

HaiShaw approved these changes Apr 7, 2026

View reviewed changes

HaiShaw merged commit ba78f6e into main Apr 7, 2026
58 of 64 checks passed

HaiShaw deleted the amd/qwen35-fp8-perf-test branch April 7, 2026 06:46

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x (

c5d2ab4

sgl-project#21669)

caitengwei pushed a commit to caitengwei/sglang that referenced this pull request Jun 1, 2026

[AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x (

fe6430d

sgl-project#21669)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x#21669

[AMD] Add Qwen3.5-397B FP8 nightly perf benchmarks for MI30x and MI35x#21669
HaiShaw merged 5 commits into
mainfrom
amd/qwen35-fp8-perf-test

michaelzhang-ai commented Mar 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Uh oh!

Jackycheng0808 left a comment

Uh oh!

yichiche left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	itl = 1 / (result.output_throughput / result.batch_size) * 1000
	itl = (result.batch_size / result.output_throughput) * 1000 if result.output_throughput > 0 else 0

		os.environ.setdefault("HF_HOME", "/data2/models/huggingface")
		os.environ.setdefault("HF_HUB_CACHE", "/data2/models/huggingface/hub")

Conversation

michaelzhang-ai commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New test files

Server configuration (matches InferenceX benchmarks)

Workflow updates

Accuracy summary fix

Behavior

CI validation

Run 3 — aiter attention backend (latest)

Run 2 — triton attention backend + accuracy summary fix ✅

Run 1 — initial (perf only, triton backend)

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Jackycheng0808 left a comment

Choose a reason for hiding this comment

Uh oh!

yichiche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelzhang-ai commented Mar 30, 2026 •

edited

Loading