Skip to content

[AMD] Add MI35x nightly CI tests#16588

Merged
HaiShaw merged 20 commits intosgl-project:mainfrom
michaelzhang-ai:add-mi35x-nightly-tests
Jan 9, 2026
Merged

[AMD] Add MI35x nightly CI tests#16588
HaiShaw merged 20 commits intosgl-project:mainfrom
michaelzhang-ai:add-mi35x-nightly-tests

Conversation

@michaelzhang-ai
Copy link
Copy Markdown
Collaborator

@michaelzhang-ai michaelzhang-ai commented Jan 6, 2026

Motivation

Add nightly CI tests for the new MI35x cluster, enabling comprehensive testing of SGLang on AMD's MI35x architecture alongside existing MI300X tests.

MI35x Coverage: +32 model/tests total — 17 TP1/TP2 models, 5 VLMs, 2 GPT-OSS, 3 GROK, 2 DeepSeek-R1-MXFP4 variants (basic, MTP), and 3 perf benchmarks (2 GROK + 1 DeepSeek).

CI all green: https://github.com/sgl-project/sglang/actions/runs/20770209451
Please help to review. @yctseng0211 @bingxche @HaiShaw

Modifications

New MI35x test jobs (linux-mi35x-gpu-2, linux-mi35x-gpu-8 runners):

  • nightly-test-2-gpu-mi35x - 2-GPU evaluation tests
  • nightly-test-2-gpu-vlm-mi35x - 2-GPU VLM MMMU tests
  • nightly-test-8-gpu-mi35x-gpt-oss - GPT-OSS models (openai/* paths)
  • nightly-test-8-gpu-mi35x-grok - GROK models
  • nightly-test-8-gpu-mi35x-deepseek-r1 - DeepSeek-R1 (basic + MTP)
  • nightly-perf-8-gpu-mi35x-grok - GROK performance benchmarks
  • nightly-perf-8-gpu-mi35x-deepseek-r1-mxfp4 - DeepSeek-R1-MXFP4 performance

MI300X improvements:

  • Combined deepseek-r1 job to run basic + MTP variants + DP attention + torch compile tests
  • Migrated tests from test/srt/nightly/ to test/registered/amd/nightly/ per CI reorg roadmap

Accuracy Tests

TestNightlyGsm8KEval (TP=2)

Model TP Score Threshold Startup Eval Total Status
meta-llama/Llama-3.1-8B-Instruct 1 0.852 0.82 111s 16s 127s ✅ PASS
mistralai/Mistral-7B-Instruct-v0.3 1 0.604 0.58 81s 9s 90s ✅ PASS
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct 1 0.864 0.85 422s 23s 445s ✅ PASS
google/gemma-2-27b-it 1 0.920 0.91 191s 11s 202s ✅ PASS
meta-llama/Llama-3.1-70B-Instruct 2 0.964 0.95 332s 17s 349s ✅ PASS
mistralai/Mixtral-8x7B-Instruct-v0.1 2 0.668 0.61 141s 10s 151s ✅ PASS
Qwen/Qwen2-57B-A14B-Instruct 2 0.884 0.86 422s 18s 440s ✅ PASS
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 1 0.860 0.80 71s 13s 84s ✅ PASS
neuralmagic/Mistral-7B-Instruct-v0.3-FP8 1 0.580 0.54 41s 10s 51s ✅ PASS
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 2 0.964 0.94 111s 37s 148s ✅ PASS
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 2 0.716 0.62 91s 9s 101s ✅ PASS
neuralmagic/Qwen2-72B-Instruct-FP8 2 0.956 0.94 131s 14s 145s ✅ PASS
neuralmagic/Qwen2-57B-A14B-Instruct-FP8 2 0.868 0.86 311s 8s 319s ✅ PASS
meta-llama/Llama-3.2-3B-Instruct 1 0.796 0.55 51s 11s 62s ✅ PASS
Qwen/Qwen2.5-7B-Instruct 1 0.908 0.85 51s 8s 59s ✅ PASS
Qwen/Qwen3-8B 1 0.832 0.77 51s 24s 75s ✅ PASS
Qwen/Qwen3-30B-A3B-Thinking-2507 2 0.956 0.84 261s 30s 291s ✅ PASS

Model Group: grok

Model TP Accuracy Threshold Startup Bench Total Status
lmzheng/grok-1 8 0.870 0.8 1442s 17s 1460s ✅ PASS
amd/grok-1-W4A8KV8 8 0.835 0.8 952s 15s 968s ✅ PASS
xai-org/grok-2 8 0.920 0.915 421s 38s 460s ✅ PASS

MI35x Model Group: deepseek-r1

Model TP Accuracy Threshold Startup Bench Total Status
deepseek-ai/DeepSeek-R1-0528 (basic) 8 0.975 0.93 2714s 32s 2746s ✅ PASS
deepseek-ai/DeepSeek-R1-0528 (MTP) 8 0.970 0.93 1353s 18s 1370s ✅ PASS

details in CI : https://github.com/sgl-project/sglang/actions/runs/20770209451

Benchmarking and Profiling

TestNightlyGrokPerformance

amd/grok-1-W4A8KV8 (grok1)

batch size input len latency (s) input throughput (tok/s) output throughput (tok/s) ITL (ms)
1 1024 5.07 12316.97 102.74 9.73
1 1024 5.08 12115.73 102.56 9.75
8 1024 6.08 34696.79 700.75 11.42
16 1024 6.32 35436.39 1398.64 11.44
64 1024 8.21 38049.96 5049.24 12.68

xai-org/grok-2 (grok2)

batch size input len latency (s) input throughput (tok/s) output throughput (tok/s) ITL (ms)
1 1024 6.99 10000.78 74.31 13.46
1 1024 7.01 9983.34 74.08 13.50
8 1024 8.73 25857.56 486.99 16.43
16 1024 9.24 26058.66 950.95 16.83
64 1024 11.83 28147.08 3447.59 18.56

TestNightlyDeepseekR1MXFP4Performance

amd/DeepSeek-R1-MXFP4-Preview (basic)

batch size input len latency (s) input throughput (tok/s) output throughput (tok/s) ITL (ms)
1 4096 4.83 9372.53 116.61 8.58
1 4096 4.52 30569.49 116.63 8.57
8 4096 5.90 43786.90 795.80 10.05
16 4096 7.52 44497.10 1355.54 11.80
64 4096 15.78 44878.30 3297.15 19.41

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

- test_mi35x_basic_1gpu.py: 1-GPU basic model tests
- test_mi35x_eval_2gpu.py: 2-GPU evaluation tests (TP=2)
- test_mi35x_large_8gpu.py: 8-GPU large model tests (TP=8)

Uses runners: linux-mi35x-gpu-1, linux-mi35x-gpu-2, linux-mi35x-gpu-8
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Added new nightly test jobs for MI35x GPUs, including 2-GPU and VLM evaluation tests.
- Updated existing nightly test configurations to include MI35x jobs.
- Introduced new test files for GSM8K completion and evaluation, along with VLM MMMU evaluation tests.
- Removed outdated 1-GPU and 2-GPU MI35x test files.

This update improves coverage for AMD's MI35x architecture in the nightly CI pipeline.
@github-actions github-actions bot added amd Multi-modal multi-modal language model labels Jan 6, 2026
- Introduced new nightly test jobs for MI35x 8-GPU configurations, including tests for GPT-OSS, GROK, and DeepSeek models.
- Updated the run suite to include the new MI35x 8-GPU suite.
- Added a new test file for GSM8K completion evaluation specific to MI35x models.

This enhances the testing framework for AMD's MI35x architecture, ensuring comprehensive coverage in the nightly CI pipeline.
- Consolidated DeepSeek-R1 tests into a single job with combined DP and TC configurations.
- Introduced new performance benchmark for DeepSeek-R1-MXFP4 model on MI35x.
- Updated model configurations to include basic and MTP variants for MI35x.
- Enhanced test descriptions for clarity and accuracy in nightly evaluation.

This update streamlines the testing process and improves coverage for DeepSeek-R1 models in the nightly CI pipeline.
michaelzhang-ai and others added 5 commits January 6, 2026 17:42
- Refactored model path configuration to prioritize environment variable, local path, and HuggingFace model ID.
- Introduced a new function to determine the effective model path based on availability.
- Updated test classes to utilize the new model path logic, improving flexibility and clarity in model sourcing.

This update streamlines the model path management in the nightly performance benchmarks for DeepSeek-R1-MXFP4 on MI35x.
- Removed the pull_request trigger from the nightly test workflow for AMD.
- Enhanced code readability by formatting multi-line function calls and string concatenations in the DeepSeek-R1-MXFP4 performance test.
- Cleaned up trailing whitespace in several test files.

This update streamlines the nightly testing process and improves code clarity across the AMD test suite.
… and MI35x

- Added pull_request trigger to the nightly test workflow for AMD.
- Consolidated DeepSeek-R1 tests into a single job with all variants (basic, MTP, DP, TC) for MI35x.
- Updated model configurations to reflect the new naming and structure, ensuring consistency across tests.
- Enhanced logging to include variant names in test summaries for better clarity.

This update improves the nightly testing process and ensures comprehensive coverage for DeepSeek-R1 models.
- Changed MI35x accuracy tests from deepseek-r1-all to deepseek-r1
- Only runs basic and MTP variants (DP/TC cause timeout with full model)
- DeepSeek-R1-0528 (~91GB/GPU) too large for DP initialization on MI35x
- MXFP4 still used for perf tests
- Reduced timeout from 300 to 180 minutes
@michaelzhang-ai
Copy link
Copy Markdown
Collaborator Author

michaelzhang-ai commented Jan 7, 2026

@michaelzhang-ai michaelzhang-ai marked this pull request as ready for review January 7, 2026 06:26
@michaelzhang-ai michaelzhang-ai changed the title [AMD] Add MI35x nightly tests [AMD] Add MI35x nightly CI tests Jan 7, 2026
These suites were migrated to test/registered/amd/nightly/ and are now
managed by test/run_suite.py using the registry system.
@bingxche bingxche added the run-ci label Jan 7, 2026
Copy link
Copy Markdown
Collaborator

@yctseng0211 yctseng0211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added documentation Improvements or additions to documentation quant LLM Quantization npu diffusion SGLang Diffusion model-gateway labels Jan 7, 2026
@michaelzhang-ai michaelzhang-ai force-pushed the add-mi35x-nightly-tests branch from 003e220 to e62e96b Compare January 7, 2026 17:22
@michaelzhang-ai
Copy link
Copy Markdown
Collaborator Author

michaelzhang-ai commented Jan 7, 2026

@yctseng0211
Copy link
Copy Markdown
Collaborator

These three failures will be fixed by #16675
image

Copy link
Copy Markdown
Collaborator

@bingxche bingxche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelzhang-ai
Copy link
Copy Markdown
Collaborator Author

michaelzhang-ai commented Jan 9, 2026

https://github.com/sgl-project/sglang/actions/runs/20837677036 all AMD PR test pass. Ready to merge @HaiShaw. Thanks!

Copy link
Copy Markdown
Collaborator

@yctseng0211 yctseng0211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, changed amd only

@HaiShaw HaiShaw merged commit fcec35d into sgl-project:main Jan 9, 2026
76 of 101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd deepseek diffusion SGLang Diffusion documentation Improvements or additions to documentation model-gateway Multi-modal multi-modal language model npu quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants