Skip to content

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek)#19834

Merged
HaiShaw merged 12 commits intomainfrom
amd_ci_dpsk_test
Mar 5, 2026
Merged

[AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek)#19834
HaiShaw merged 12 commits intomainfrom
amd_ci_dpsk_test

Conversation

@yctseng0211
Copy link
Copy Markdown
Collaborator

@yctseng0211 yctseng0211 commented Mar 4, 2026

Motivation

Track accuracy and performance regression for two new DeepSeek-R1-MXFP4 server configurations on MI35x:

  • --kv-cache-dtype fp8_e4m3
  • --enable-aiter-allreduce-fusion

Update work_flow dispatch mechanism to enable multi-job triggering.
Add PD/D test to pr-test-amd-rocm720.yml

Modifications

  • Add 6 test files under test/registered/amd/
  • Add corresponding nightly jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml

Accuracy Tests

Nightly Test:

PR Test:

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@yctseng0211 yctseng0211 changed the title [AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek-R1-MXFP4) [AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) Mar 4, 2026
@yctseng0211 yctseng0211 changed the title [AMD] CI - Add MI35x nightly tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) [AMD] CI - Add MI35x nightly/PR tests for kv-cache-fp8 and allreduce-fusion (DeepSeek) Mar 4, 2026
@yctseng0211 yctseng0211 marked this pull request as ready for review March 4, 2026 13:38
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@yctseng0211
Copy link
Copy Markdown
Collaborator Author

@bingxche @michaelzhang-ai please help review, thanks!
Nightly Test:

PR Test:

@michaelzhang-ai
Copy link
Copy Markdown
Collaborator

michaelzhang-ai commented Mar 4, 2026

Please double check with test args.
Inconsistent server args between accuracy and perf tests:

The accuracy tests launch the server with --attention-backend aiter and SGLANG_USE_AITER=1, but the corresponding perf tests omit both.

Accuracy test (test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py):

other_args=[
    "--attention-backend", "aiter",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--trust-remote-code",
    "--kv-cache-dtype", "fp8_e4m3",
],
env_vars={"SGLANG_USE_AITER": "1"},

Perf test (test_deepseek_r1_mxfp4_kv_fp8_perf_mi35x.py):

"other_args": [
    "--trust-remote-code",
    "--tp", "8",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--kv-cache-dtype", "fp8_e4m3",
],

This means the perf tests benchmark a different server configuration than what the accuracy tests validate. If the aiter backend is needed for the MXFP4 model on MI35x, the perf numbers won't reflect production behavior. The same issue exists for the allreduce-fusion variant.

@michaelzhang-ai michaelzhang-ai self-requested a review March 4, 2026 23:15
@yctseng0211
Copy link
Copy Markdown
Collaborator Author

Please double check with test args. 1. Inconsistent server args between accuracy and perf tests (potential bug)

The accuracy tests launch the server with --attention-backend aiter and SGLANG_USE_AITER=1, but the corresponding perf tests omit both.

Accuracy test (test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py):

other_args=[
    "--attention-backend", "aiter",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--trust-remote-code",
    "--kv-cache-dtype", "fp8_e4m3",
],
env_vars={"SGLANG_USE_AITER": "1"},

Perf test (test_deepseek_r1_mxfp4_kv_fp8_perf_mi35x.py):

"other_args": [
    "--trust-remote-code",
    "--tp", "8",
    "--chunked-prefill-size", "131072",
    "--disable-radix-cache",
    "--mem-fraction-static", "0.85",
    "--kv-cache-dtype", "fp8_e4m3",
],

This means the perf tests benchmark a different server configuration than what the accuracy tests validate. If the aiter backend is needed for the MXFP4 model on MI35x, the perf numbers won't reflect production behavior. The same issue exists for the allreduce-fusion variant.

@michaelzhang-ai thanks for your review, the configs are based on the existing amd nightly test

Copy link
Copy Markdown
Collaborator

@bingxche bingxche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaiShaw HaiShaw merged commit b5edab5 into main Mar 5, 2026
44 of 87 checks passed
@HaiShaw HaiShaw deleted the amd_ci_dpsk_test branch March 5, 2026 15:10
qeternity pushed a commit to qeternity/sglang that referenced this pull request Mar 6, 2026
…fusion (DeepSeek) (sgl-project#19834)

Co-authored-by: bingxche <Bingxu.Chen@amd.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026
…fusion (DeepSeek) (sgl-project#19834)

Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…fusion (DeepSeek) (sgl-project#19834)

Co-authored-by: bingxche <Bingxu.Chen@amd.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…fusion (DeepSeek) (sgl-project#19834)

Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants