Skip to content

[AMD] Add Kimi-K2, DeepSeek-V3.2 tests to nightly CI#17523

Merged
HaiShaw merged 24 commits intosgl-project:mainfrom
michaelzhang-ai:add_mtp_accuracy_test
Jan 28, 2026
Merged

[AMD] Add Kimi-K2, DeepSeek-V3.2 tests to nightly CI#17523
HaiShaw merged 24 commits intosgl-project:mainfrom
michaelzhang-ai:add_mtp_accuracy_test

Conversation

@michaelzhang-ai
Copy link
Collaborator

@michaelzhang-ai michaelzhang-ai commented Jan 21, 2026

Motivation

  1. Add Kimi-K2, DeepSeek-V3.2 accuracy and performance tests for MI325 (MI30x) platform, update Mi35x tests, consolidate test jobs, and fix various CI failures.
  2. Total add 9 unique test to AMD CI. (https://github.com/sgl-project/sglang/actions/runs/21423272318?pr=17523)

Nightly ci pass: https://github.com/sgl-project/sglang/actions/runs/21422385034
Please help to review. @yctseng0211 @bingxche

Modifications

New MI325 and MI355 Tests:

  • nightly-8-gpu-deepseek-v32: Basic accuracy + perf
  • nightly-8-gpu-deepseek-v32-mtp: MTP (EAGLE speculative) accuracy + perf
  • nightly-8-gpu-kimi-k2: Kimi-K2-Instruct-0905 accuracy

CI Fixes:

  • Increase MI35x MTP perf timeout: 5400s server launch timeout
  • Add accuracy logging (accuracy={acc:.3f} threshold={threshold} {status}) to all eval tests

Removed:

  • nightly-8-gpu-deepseek-r1 job (redundant with MI35x tests)

Accuracy Tests

Kimi-K2 Model (MI325)

Model TP Accuracy Threshold Status
moonshotai/Kimi-K2-Instruct-0905 8 0.953 0.94 ✅ PASS

Benchmarking and Profiling

DeepSeek-V3.2 Models (MI325)

Model Variant TP Accuracy Threshold Status
deepseek-ai/DeepSeek-V3.2 basic 8 0.950 0.93 ✅ PASS

TestNightlyDeepseekV32BasicPerformance

deepseek-ai/DeepSeek-V3.2 (basic) [MI325]

batch size input len latency (s) input throughput (tok/s) output throughput (tok/s) ITL (ms)
1 4096 12.30 2818.94 47.22 21.18
8 4096 17.35 8301.86 305.59 26.18
16 4096 21.19 11183.46 534.36 29.94
64 4096 33.67 20786.93 1556.37 41.12

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@michaelzhang-ai michaelzhang-ai changed the title [CI] Add tests for MI35x DeepSeek-V3.2 DP and TP+MTP [CI] Add tests for MI35x DeepSeek-V3.2 DP and MTP Jan 22, 2026
…test

This update introduces a new function, _run_benchmark_with_timeout, to manage server launch timeouts during benchmark execution. The test_bench_one_batch method has been modified to utilize this new function, enhancing the robustness of the performance testing process.
…model evaluation results across multiple test files.
…ble condition in the workflow configuration.
@michaelzhang-ai michaelzhang-ai changed the title [CI] Add tests for MI35x DeepSeek-V3.2 DP and MTP [CI] Add DeepSeek-V3.2 to MI325 and MI355 nightly test Jan 24, 2026
@michaelzhang-ai michaelzhang-ai changed the title [CI] Add DeepSeek-V3.2 to MI325 and MI355 nightly test [AMD] Add DeepSeek-V3.2 to MI325 and MI355 nightly test Jan 24, 2026
@michaelzhang-ai michaelzhang-ai changed the title [AMD] Add DeepSeek-V3.2 to MI325 and MI355 nightly test [AMD] Add DeepSeek-V3.2 accuracy and performance tests to MI325 nightly CI Jan 27, 2026
@yctseng0211
Copy link
Collaborator

image

will be fixed by #17633

- Introduced a new job in the nightly workflow for Kimi-K2 accuracy testing.
- Added a new test script for evaluating Kimi-K2 with the GSM8K benchmark.
- Updated workflow triggers to include pull requests affecting the nightly test configuration.
…sue. Update AMD failing models list to include GLM-4.1V.
@github-actions github-actions bot added the Multi-modal multi-modal language model label Jan 27, 2026
@michaelzhang-ai michaelzhang-ai changed the title [AMD] Add DeepSeek-V3.2 accuracy and performance tests to MI325 nightly CI [AMD] Add Kimi-K2, DeepSeek-V3.2 tests to nightly CI Jan 27, 2026
… DeepSeek

- Excluded MI35x performance jobs from CI checks to prevent blocking on non-critical failures.
- Adjusted model score threshold for Mixtral-8x7B to 0.57.
- Added a watchdog timeout of 1200 seconds to performance test scripts for DeepSeek V32 Basic and MTP on both AMD and MI35x platforms.
@michaelzhang-ai
Copy link
Collaborator Author

michaelzhang-ai commented Jan 28, 2026

Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider to move non-mi35x TCs to mi30x subdir, later.

@HaiShaw HaiShaw merged commit f8636fb into sgl-project:main Jan 28, 2026
139 of 157 checks passed
@michaelzhang-ai
Copy link
Collaborator Author

consider to move non-mi35x TCs to mi30x subdir, later.

reorg folder: #17895

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Jan 30, 2026
Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026
sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd deepseek Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants