[AMD] Add MI35x nightly CI tests by michaelzhang-ai · Pull Request #16588 · sgl-project/sglang

michaelzhang-ai · 2026-01-06T20:30:59Z

Motivation

Add nightly CI tests for the new MI35x cluster, enabling comprehensive testing of SGLang on AMD's MI35x architecture alongside existing MI300X tests.

MI35x Coverage: +32 model/tests total — 17 TP1/TP2 models, 5 VLMs, 2 GPT-OSS, 3 GROK, 2 DeepSeek-R1-MXFP4 variants (basic, MTP), and 3 perf benchmarks (2 GROK + 1 DeepSeek).

CI all green: https://github.com/sgl-project/sglang/actions/runs/20770209451
Please help to review. @yctseng0211 @bingxche @HaiShaw

Modifications

New MI35x test jobs (linux-mi35x-gpu-2, linux-mi35x-gpu-8 runners):

nightly-test-2-gpu-mi35x - 2-GPU evaluation tests
nightly-test-2-gpu-vlm-mi35x - 2-GPU VLM MMMU tests
nightly-test-8-gpu-mi35x-gpt-oss - GPT-OSS models (openai/* paths)
nightly-test-8-gpu-mi35x-grok - GROK models
nightly-test-8-gpu-mi35x-deepseek-r1 - DeepSeek-R1 (basic + MTP)
nightly-perf-8-gpu-mi35x-grok - GROK performance benchmarks
nightly-perf-8-gpu-mi35x-deepseek-r1-mxfp4 - DeepSeek-R1-MXFP4 performance

MI300X improvements:

Combined deepseek-r1 job to run basic + MTP variants + DP attention + torch compile tests
Migrated tests from test/srt/nightly/ to test/registered/amd/nightly/ per CI reorg roadmap

Accuracy Tests

TestNightlyGsm8KEval (TP=2)

Model	TP	Score	Threshold	Startup	Eval	Total	Status
meta-llama/Llama-3.1-8B-Instruct	1	0.852	0.82	111s	16s	127s	✅ PASS
mistralai/Mistral-7B-Instruct-v0.3	1	0.604	0.58	81s	9s	90s	✅ PASS
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	1	0.864	0.85	422s	23s	445s	✅ PASS
google/gemma-2-27b-it	1	0.920	0.91	191s	11s	202s	✅ PASS
meta-llama/Llama-3.1-70B-Instruct	2	0.964	0.95	332s	17s	349s	✅ PASS
mistralai/Mixtral-8x7B-Instruct-v0.1	2	0.668	0.61	141s	10s	151s	✅ PASS
Qwen/Qwen2-57B-A14B-Instruct	2	0.884	0.86	422s	18s	440s	✅ PASS
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8	1	0.860	0.80	71s	13s	84s	✅ PASS
neuralmagic/Mistral-7B-Instruct-v0.3-FP8	1	0.580	0.54	41s	10s	51s	✅ PASS
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8	2	0.964	0.94	111s	37s	148s	✅ PASS
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8	2	0.716	0.62	91s	9s	101s	✅ PASS
neuralmagic/Qwen2-72B-Instruct-FP8	2	0.956	0.94	131s	14s	145s	✅ PASS
neuralmagic/Qwen2-57B-A14B-Instruct-FP8	2	0.868	0.86	311s	8s	319s	✅ PASS
meta-llama/Llama-3.2-3B-Instruct	1	0.796	0.55	51s	11s	62s	✅ PASS
Qwen/Qwen2.5-7B-Instruct	1	0.908	0.85	51s	8s	59s	✅ PASS
Qwen/Qwen3-8B	1	0.832	0.77	51s	24s	75s	✅ PASS
Qwen/Qwen3-30B-A3B-Thinking-2507	2	0.956	0.84	261s	30s	291s	✅ PASS

Model Group: grok

Model	TP	Accuracy	Threshold	Startup	Bench	Total	Status
lmzheng/grok-1	8	0.870	0.8	1442s	17s	1460s	✅ PASS
amd/grok-1-W4A8KV8	8	0.835	0.8	952s	15s	968s	✅ PASS
xai-org/grok-2	8	0.920	0.915	421s	38s	460s	✅ PASS

MI35x Model Group: deepseek-r1

Model	TP	Accuracy	Threshold	Startup	Bench	Total	Status
deepseek-ai/DeepSeek-R1-0528 (basic)	8	0.975	0.93	2714s	32s	2746s	✅ PASS
deepseek-ai/DeepSeek-R1-0528 (MTP)	8	0.970	0.93	1353s	18s	1370s	✅ PASS

details in CI : https://github.com/sgl-project/sglang/actions/runs/20770209451

Benchmarking and Profiling

TestNightlyGrokPerformance

amd/grok-1-W4A8KV8 (grok1)

batch size	input len	latency (s)	input throughput (tok/s)	output throughput (tok/s)	ITL (ms)
1	1024	5.07	12316.97	102.74	9.73
1	1024	5.08	12115.73	102.56	9.75
8	1024	6.08	34696.79	700.75	11.42
16	1024	6.32	35436.39	1398.64	11.44
64	1024	8.21	38049.96	5049.24	12.68

xai-org/grok-2 (grok2)

batch size	input len	latency (s)	input throughput (tok/s)	output throughput (tok/s)	ITL (ms)
1	1024	6.99	10000.78	74.31	13.46
1	1024	7.01	9983.34	74.08	13.50
8	1024	8.73	25857.56	486.99	16.43
16	1024	9.24	26058.66	950.95	16.83
64	1024	11.83	28147.08	3447.59	18.56

TestNightlyDeepseekR1MXFP4Performance

amd/DeepSeek-R1-MXFP4-Preview (basic)

batch size	input len	latency (s)	input throughput (tok/s)	output throughput (tok/s)	ITL (ms)
1	4096	4.83	9372.53	116.61	8.58
1	4096	4.52	30569.49	116.63	8.57
8	4096	5.90	43786.90	795.80	10.05
16	4096	7.52	44497.10	1355.54	11.80
64	4096	15.78	44878.30	3297.15	19.41

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

- test_mi35x_basic_1gpu.py: 1-GPU basic model tests - test_mi35x_eval_2gpu.py: 2-GPU evaluation tests (TP=2) - test_mi35x_large_8gpu.py: 8-GPU large model tests (TP=8) Uses runners: linux-mi35x-gpu-1, linux-mi35x-gpu-2, linux-mi35x-gpu-8

gemini-code-assist · 2026-01-06T20:31:03Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Added new nightly test jobs for MI35x GPUs, including 2-GPU and VLM evaluation tests. - Updated existing nightly test configurations to include MI35x jobs. - Introduced new test files for GSM8K completion and evaluation, along with VLM MMMU evaluation tests. - Removed outdated 1-GPU and 2-GPU MI35x test files. This update improves coverage for AMD's MI35x architecture in the nightly CI pipeline.

- Introduced new nightly test jobs for MI35x 8-GPU configurations, including tests for GPT-OSS, GROK, and DeepSeek models. - Updated the run suite to include the new MI35x 8-GPU suite. - Added a new test file for GSM8K completion evaluation specific to MI35x models. This enhances the testing framework for AMD's MI35x architecture, ensuring comprehensive coverage in the nightly CI pipeline.

- Consolidated DeepSeek-R1 tests into a single job with combined DP and TC configurations. - Introduced new performance benchmark for DeepSeek-R1-MXFP4 model on MI35x. - Updated model configurations to include basic and MTP variants for MI35x. - Enhanced test descriptions for clarity and accuracy in nightly evaluation. This update streamlines the testing process and improves coverage for DeepSeek-R1 models in the nightly CI pipeline.

- Refactored model path configuration to prioritize environment variable, local path, and HuggingFace model ID. - Introduced a new function to determine the effective model path based on availability. - Updated test classes to utilize the new model path logic, improving flexibility and clarity in model sourcing. This update streamlines the model path management in the nightly performance benchmarks for DeepSeek-R1-MXFP4 on MI35x.

- Removed the pull_request trigger from the nightly test workflow for AMD. - Enhanced code readability by formatting multi-line function calls and string concatenations in the DeepSeek-R1-MXFP4 performance test. - Cleaned up trailing whitespace in several test files. This update streamlines the nightly testing process and improves code clarity across the AMD test suite.

… and MI35x - Added pull_request trigger to the nightly test workflow for AMD. - Consolidated DeepSeek-R1 tests into a single job with all variants (basic, MTP, DP, TC) for MI35x. - Updated model configurations to reflect the new naming and structure, ensuring consistency across tests. - Enhanced logging to include variant names in test summaries for better clarity. This update improves the nightly testing process and ensures comprehensive coverage for DeepSeek-R1 models.

- Changed MI35x accuracy tests from deepseek-r1-all to deepseek-r1 - Only runs basic and MTP variants (DP/TC cause timeout with full model) - DeepSeek-R1-0528 (~91GB/GPU) too large for DP initialization on MI35x - MXFP4 still used for perf tests - Reduced timeout from 300 to 180 minutes

.github/workflows/nightly-test-amd.yml

michaelzhang-ai · 2026-01-07T05:47:50Z

CI all green: https://github.com/sgl-project/sglang/actions/runs/20770209451. Ready for review and merge. @bingxche @yctseng0211 @HaiShaw

These suites were migrated to test/registered/amd/nightly/ and are now managed by test/run_suite.py using the registry system.

yctseng0211

LGTM

michaelzhang-ai · 2026-01-07T17:29:55Z

PR test pass before merge upstream: https://github.com/sgl-project/sglang/actions/runs/20773365925. cc: @HaiShaw @yctseng0211 @bingxche

#15712 merged cause AMD CI failed stage-a-test-1-amd (linux-mi325-gpu-1). @Fridge003

yctseng0211 · 2026-01-08T15:20:10Z

These three failures will be fixed by #16675

bingxche

LGTM

michaelzhang-ai · 2026-01-09T03:07:38Z

https://github.com/sgl-project/sglang/actions/runs/20837677036 all AMD PR test pass. Ready to merge @HaiShaw. Thanks!

yctseng0211

LGTM, changed amd only

Add MI35x nightly test files

eb6ab38

- test_mi35x_basic_1gpu.py: 1-GPU basic model tests - test_mi35x_eval_2gpu.py: 2-GPU evaluation tests (TP=2) - test_mi35x_large_8gpu.py: 8-GPU large model tests (TP=8) Uses runners: linux-mi35x-gpu-1, linux-mi35x-gpu-2, linux-mi35x-gpu-8

github-actions bot added amd Multi-modal multi-modal language model labels Jan 6, 2026

michaelzhang-ai added 2 commits January 6, 2026 15:07

github-actions bot added the deepseek label Jan 6, 2026

michaelzhang-ai and others added 5 commits January 6, 2026 17:42

Merge branch 'main' into add-mi35x-nightly-tests

d33af4c

bingxche reviewed Jan 7, 2026

View reviewed changes

.github/workflows/nightly-test-amd.yml Show resolved Hide resolved

michaelzhang-ai marked this pull request as ready for review January 7, 2026 06:26

michaelzhang-ai requested review from Fridge003, Kangyan-Zhou, ispobock and merrymercy as code owners January 7, 2026 06:26

Remove pull_request trigger before merge

284def5

michaelzhang-ai requested a review from bingxche January 7, 2026 06:32

Merge branch 'main' into add-mi35x-nightly-tests

9a675e8

michaelzhang-ai changed the title ~~[AMD] Add MI35x nightly tests~~ [AMD] Add MI35x nightly CI tests Jan 7, 2026

Remove duplicate AMD nightly suites from test/srt/run_suite.py

132979d

These suites were migrated to test/registered/amd/nightly/ and are now managed by test/run_suite.py using the registry system.

bingxche added the run-ci label Jan 7, 2026

yctseng0211 reviewed Jan 7, 2026

View reviewed changes

michaelzhang-ai force-pushed the add-mi35x-nightly-tests branch from eb98159 to e8e0071 Compare January 7, 2026 17:12

michaelzhang-ai requested review from CatherineSue and key4ng as code owners January 7, 2026 17:12

michaelzhang-ai requested review from HaiShaw, Qiaolin-Yu, ShangmingCai, Ying1123, hebiao064, hnyls2002, ishandhanani and xiezhq-hermann as code owners January 7, 2026 17:12

github-actions bot added documentation Improvements or additions to documentation quant LLM Quantization npu diffusion SGLang Diffusion model-gateway labels Jan 7, 2026

Merge upstream/main - resolve run_suite.py conflict

e62e96b

michaelzhang-ai force-pushed the add-mi35x-nightly-tests branch from 003e220 to e62e96b Compare January 7, 2026 17:22

michaelzhang-ai and others added 3 commits January 7, 2026 11:25

Merge branch 'main' into add-mi35x-nightly-tests

30f127f

Merge branch 'main' into add-mi35x-nightly-tests

6ba4cba

Merge branch 'main' into add-mi35x-nightly-tests

21e61fb

michaelzhang-ai and others added 4 commits January 8, 2026 10:31

Merge branch 'main' into add-mi35x-nightly-tests

3af0b76

Merge branch 'main' into add-mi35x-nightly-tests

5ce5747

Merge branch 'main' into add-mi35x-nightly-tests

3006d68

Merge branch 'main' into add-mi35x-nightly-tests

6786ba7

bingxche approved these changes Jan 9, 2026

View reviewed changes

michaelzhang-ai requested a review from yctseng0211 January 9, 2026 03:08

yctseng0211 approved these changes Jan 9, 2026

View reviewed changes

HaiShaw approved these changes Jan 9, 2026

View reviewed changes

HaiShaw merged commit fcec35d into sgl-project:main Jan 9, 2026
76 of 101 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add MI35x nightly CI tests#16588

[AMD] Add MI35x nightly CI tests#16588
HaiShaw merged 20 commits intosgl-project:mainfrom
michaelzhang-ai:add-mi35x-nightly-tests

michaelzhang-ai commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Uh oh!

Uh oh!

michaelzhang-ai commented Jan 7, 2026 •

edited

Loading

Uh oh!

yctseng0211 left a comment

Uh oh!

michaelzhang-ai commented Jan 7, 2026 •

edited

Loading

Uh oh!

yctseng0211 commented Jan 8, 2026

Uh oh!

bingxche left a comment

Uh oh!

michaelzhang-ai commented Jan 9, 2026 •

edited

Loading

Uh oh!

yctseng0211 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

michaelzhang-ai commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

TestNightlyGsm8KEval (TP=2)

Model Group: grok

MI35x Model Group: deepseek-r1

Benchmarking and Profiling

TestNightlyGrokPerformance

amd/grok-1-W4A8KV8 (grok1)

xai-org/grok-2 (grok2)

TestNightlyDeepseekR1MXFP4Performance

amd/DeepSeek-R1-MXFP4-Preview (basic)

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Uh oh!

Uh oh!

michaelzhang-ai commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yctseng0211 left a comment

Choose a reason for hiding this comment

Uh oh!

michaelzhang-ai commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yctseng0211 commented Jan 8, 2026

Uh oh!

bingxche left a comment

Choose a reason for hiding this comment

Uh oh!

michaelzhang-ai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yctseng0211 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelzhang-ai commented Jan 6, 2026 •

edited

Loading

michaelzhang-ai commented Jan 7, 2026 •

edited

Loading

michaelzhang-ai commented Jan 7, 2026 •

edited

Loading

michaelzhang-ai commented Jan 9, 2026 •

edited

Loading