[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) by yctseng0211 · Pull Request #19834 · sgl-project/sglang

yctseng0211 · 2026-03-04T05:12:05Z

Motivation

Track accuracy and performance regression for two new DeepSeek-R1-MXFP4 server configurations on MI35x:

--kv-cache-dtype fp8_e4m3
--enable-aiter-allreduce-fusion

Update work_flow dispatch mechanism to enable multi-job triggering.
Add PD/D test to pr-test-amd-rocm720.yml

Modifications

Add 6 test files under test/registered/amd/
Add corresponding nightly jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml

Accuracy Tests

Nightly Test:

PASSED nightly-8-gpu-mi35x-deepseek-r1-mxfp4-kv-fp8
PASSED nightly-8-gpu-mi35x-deepseek-r1-mxfp4-ar-fusion
PASSED nightly-8-gpu-mi35x-deepseek-r1-mxfp4-ar-fusion-rocm720
PASSED nightly-8-gpu-mi35x-deepseek-r1-mxfp4-kv-fp8-rocm720
PASSED nightly-8-gpu-deepseek-v3-kv-fp8
PASSED nightly-8-gpu-deepseek-v3-kv-fp8-rocm720

PR Test:

PASSED stage-c-test-large-8-gpu-amd

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

…uce-fusion variants

gemini-code-assist · 2026-03-04T05:12:09Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…he test lasts very short compared to runner init time, will cause longer queue time for 8-gpu-runner

gemini-code-assist · 2026-03-04T13:39:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yctseng0211 · 2026-03-04T15:27:53Z

@bingxche @michaelzhang-ai please help review, thanks!
Nightly Test:

PR Test:

PASSED stage-c-test-large-8-gpu-amd

michaelzhang-ai · 2026-03-04T23:15:14Z

Please double check with test args.
Inconsistent server args between accuracy and perf tests:

The accuracy tests launch the server with --attention-backend aiter and SGLANG_USE_AITER=1, but the corresponding perf tests omit both.

Accuracy test (test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py):

other_args=[
    "--attention-backend", "aiter",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--trust-remote-code",
    "--kv-cache-dtype", "fp8_e4m3",
],
env_vars={"SGLANG_USE_AITER": "1"},

Perf test (test_deepseek_r1_mxfp4_kv_fp8_perf_mi35x.py):

"other_args": [
    "--trust-remote-code",
    "--tp", "8",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--kv-cache-dtype", "fp8_e4m3",
],

This means the perf tests benchmark a different server configuration than what the accuracy tests validate. If the aiter backend is needed for the MXFP4 model on MI35x, the perf numbers won't reflect production behavior. The same issue exists for the allreduce-fusion variant.

yctseng0211 · 2026-03-05T01:37:13Z

Please double check with test args. 1. Inconsistent server args between accuracy and perf tests (potential bug)

The accuracy tests launch the server with --attention-backend aiter and SGLANG_USE_AITER=1, but the corresponding perf tests omit both.

Accuracy test (test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py):
other_args=[
    "--attention-backend", "aiter",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--trust-remote-code",
    "--kv-cache-dtype", "fp8_e4m3",
],
env_vars={"SGLANG_USE_AITER": "1"},
Perf test (test_deepseek_r1_mxfp4_kv_fp8_perf_mi35x.py):
"other_args": [
    "--trust-remote-code",
    "--tp", "8",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--kv-cache-dtype", "fp8_e4m3",
],
This means the perf tests benchmark a different server configuration than what the accuracy tests validate. If the aiter backend is needed for the MXFP4 model on MI35x, the perf numbers won't reflect production behavior. The same issue exists for the allreduce-fusion variant.

@michaelzhang-ai thanks for your review, the configs are based on the existing amd nightly test

bingxche

LGTM

…fusion (DeepSeek) (sgl-project#19834) Co-authored-by: bingxche <Bingxu.Chen@amd.com>

Add MI35x DeepSeek-R1-MXFP4 nightly tests for kv-cache-fp8 and allred…

abc0051

…uce-fusion variants

github-actions bot added amd deepseek labels Mar 4, 2026

Add DeepSeek-V3 kv-cache-fp8 variants to PR test suite

4221394

yctseng0211 changed the title ~~[AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek-R1-MXFP4)~~ [AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) Mar 4, 2026

yctseng0211 added the run-ci label Mar 4, 2026

yctseng0211 changed the title ~~[AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek)~~ [AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) Mar 4, 2026

bingxche added 3 commits March 4, 2026 10:45

add stage-c-test-large-8-gpu-amd suite name to run_suite.py

954c998

move aiter-allreduce-fusion test to stage-c-test-large-8-gpu-amd as t…

2b60b7a

…he test lasts very short compared to runner init time, will cause longer queue time for 8-gpu-runner

enable multiple jobs run through workflow dispatch

11bdfeb

bingxche mentioned this pull request Mar 4, 2026

[AMD] [CI] Add Tests for Aiter Allreduce Fusion #19417

Closed

5 tasks

bingxche added 2 commits March 4, 2026 11:01

resolve ci runner conflict

24af503

Merge branch 'main' into amd_ci_dpsk_test

ac9f9d3

yctseng0211 marked this pull request as ready for review March 4, 2026 13:38

yctseng0211 requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners March 4, 2026 13:38

michaelzhang-ai self-requested a review March 4, 2026 23:15

michaelzhang-ai approved these changes Mar 5, 2026

View reviewed changes

yctseng0211 added 4 commits March 4, 2026 21:38

move dpsk v3 test to nightly

fad6854

update threshold for fp8 kvcache

db781da

add partition for stage-b rocm7.2

cf83098

Update pr-test-amd-rocm720.yml

9eedd68

add pd/d test to pr-test-amd-rocm720.yml

ede690a

bingxche approved these changes Mar 5, 2026

View reviewed changes

HaiShaw approved these changes Mar 5, 2026

View reviewed changes

HaiShaw merged commit b5edab5 into main Mar 5, 2026
44 of 87 checks passed

HaiShaw deleted the amd_ci_dpsk_test branch March 5, 2026 15:10

qeternity pushed a commit to qeternity/sglang that referenced this pull request Mar 6, 2026

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-…

14c0931

…fusion (DeepSeek) (sgl-project#19834) Co-authored-by: bingxche <Bingxu.Chen@amd.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-…

186a75f

…fusion (DeepSeek) (sgl-project#19834) Co-authored-by: bingxche <Bingxu.Chen@amd.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-…

5aa024c

…fusion (DeepSeek) (sgl-project#19834) Co-authored-by: bingxche <Bingxu.Chen@amd.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-…

47c498a

…fusion (DeepSeek) (sgl-project#19834) Co-authored-by: bingxche <Bingxu.Chen@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek)#19834

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek)#19834
HaiShaw merged 12 commits intomainfrom
amd_ci_dpsk_test

yctseng0211 commented Mar 4, 2026 •

edited by bingxche

Loading

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

yctseng0211 commented Mar 4, 2026

Uh oh!

michaelzhang-ai commented Mar 4, 2026 •

edited

Loading

Uh oh!

yctseng0211 commented Mar 5, 2026

Uh oh!

bingxche left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yctseng0211 commented Mar 4, 2026 • edited by bingxche Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

yctseng0211 commented Mar 4, 2026

Uh oh!

michaelzhang-ai commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yctseng0211 commented Mar 5, 2026

Uh oh!

bingxche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yctseng0211 commented Mar 4, 2026 •

edited by bingxche

Loading

michaelzhang-ai commented Mar 4, 2026 •

edited

Loading