Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c0a57c8
[CI] Add new accuracy tests for MI35x DeepSeek-V3.2 DP and TP+MTP con…
michaelzhang-ai Jan 21, 2026
0f9be7e
Merge branch 'main' into add_mtp_accuracy_test
michaelzhang-ai Jan 22, 2026
7892300
Enhance nightly test workflow to trigger on pull requests to the main…
michaelzhang-ai Jan 22, 2026
4b88787
Disable nightly accuracy test for MI35x and adjust speed threshold in…
michaelzhang-ai Jan 23, 2026
6b09b24
Implement model configuration prefetching in DeepSeek evaluation test…
michaelzhang-ai Jan 23, 2026
ab92a48
Increase timeout for performance test MI35x in nightly workflow to ac…
michaelzhang-ai Jan 23, 2026
10c6619
Remove model configuration prefetching from DeepSeek evaluation test …
michaelzhang-ai Jan 23, 2026
00e18d3
Add timeout handling for benchmark execution in DeepSeek performance …
michaelzhang-ai Jan 23, 2026
39a8c28
Enhance accuracy test output by adding detailed print statements for …
michaelzhang-ai Jan 23, 2026
d856ce5
Enable nightly accuracy test for MI35x by removing the temporary disa…
michaelzhang-ai Jan 23, 2026
9066463
[CI] Rename V3.2 DP job to DP+TC and add TC test
michaelzhang-ai Jan 24, 2026
5fb6d75
[CI] Fix env inheritance in V3.2 tests (use os.environ.copy)
michaelzhang-ai Jan 24, 2026
be7d4cc
[CI] Remove DeepSeek-R1 job from nightly AMD workflow
michaelzhang-ai Jan 24, 2026
a006982
[CI] Remove DeepSeek-V3.2 DP+TC job (OOM on MI325)
michaelzhang-ai Jan 25, 2026
bcca382
[CI] Lower accuracy thresholds for Qwen2 FP8 models
michaelzhang-ai Jan 25, 2026
3c2fc49
Merge upstream/main - keep DeepSeek-R1 removed
michaelzhang-ai Jan 25, 2026
e2d39b7
[CI] Fix script paths: scripts/ci/amd_ci_* -> scripts/ci/amd/amd_ci_*
michaelzhang-ai Jan 25, 2026
c564fea
[CI] Update nightly test workflow to remove pull request trigger
michaelzhang-ai Jan 26, 2026
8c8ff29
Merge remote-tracking branch 'upstream/main' into add_mtp_accuracy_test
michaelzhang-ai Jan 27, 2026
8683311
Add nightly test for Kimi-K2 accuracy on AMD GPUs
michaelzhang-ai Jan 27, 2026
2ea74ae
Disable nightly accuracy test for MI35x due to shared memory limit is…
michaelzhang-ai Jan 27, 2026
eb0abc3
Update nightly test configuration and performance tests for MI35x and…
michaelzhang-ai Jan 28, 2026
5914d57
Remove pull request trigger from nightly test workflow for AMD, maint…
michaelzhang-ai Jan 28, 2026
1e4a287
Merge branch 'main' into add_mtp_accuracy_test
yctseng0211 Jan 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 158 additions & 35 deletions .github/workflows/nightly-test-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,21 @@ on:
- 'nightly-perf-2-gpu-text'
- 'nightly-perf-2-gpu-vlm'
- 'nightly-accuracy-8-gpu'
- 'nightly-accuracy-8-gpu-deepseek-r1'
# MI30x Accuracy + Performance Tests (combined)
- 'nightly-8-gpu-grok1-int4'
- 'nightly-8-gpu-grok2'
- 'nightly-8-gpu-deepseek-v31'
- 'nightly-8-gpu-deepseek-v32'
- 'nightly-8-gpu-deepseek-v32-mtp'
- 'nightly-8-gpu-kimi-k2'
# MI35x jobs
- 'nightly-test-1-gpu-mi35x'
- 'nightly-accuracy-8-gpu-mi35x'
- 'nightly-8-gpu-mi35x-grok1-int4'
- 'nightly-8-gpu-mi35x-grok2'
- 'nightly-8-gpu-mi35x-deepseek-r1-mxfp4'
- 'nightly-accuracy-8-gpu-mi35x-deepseek-v32'
- 'nightly-accuracy-8-gpu-mi35x-deepseek-v32-mtp'
- 'nightly-perf-8-gpu-mi35x-deepseek-v32-basic'
- 'nightly-perf-8-gpu-mi35x-deepseek-v32-mtp'
workflow_call:
Expand Down Expand Up @@ -248,35 +251,6 @@ jobs:
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# 8-GPU DeepSeek-R1 Accuracy Test (separate job due to long loading time)
nightly-accuracy-8-gpu-deepseek-r1:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-accuracy-8-gpu-deepseek-r1')
runs-on: linux-mi325-gpu-8
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- name: Setup docker
run: |
touch github_summary.md
bash scripts/ci/amd/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd/amd_ci_install_dependency.sh

- name: Accuracy Test (8-GPU DeepSeek-R1)
timeout-minutes: 240
run: |
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-deepseek-r1 --nightly --timeout-per-file 7200 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# ============================================== MI30x Combined Accuracy + Performance Tests ==============================================
# 8-GPU Grok1-INT4 (Accuracy + Performance combined)
nightly-8-gpu-grok1-int4:
Expand Down Expand Up @@ -407,6 +381,118 @@ jobs:
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# 8-GPU DeepSeek-V3.2 (Basic Accuracy + Perf)
nightly-8-gpu-deepseek-v32:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-8-gpu-deepseek-v32')
runs-on: linux-mi325-gpu-8
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- name: Setup docker
run: |
touch github_summary.md
bash scripts/ci/amd/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd/amd_ci_install_dependency.sh

- name: Accuracy Test (8-GPU DeepSeek-V3.2 Basic)
timeout-minutes: 120
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-deepseek-v32 --nightly --timeout-per-file 3600 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

- name: Performance Test (8-GPU DeepSeek-V3.2 Basic)
timeout-minutes: 150
continue-on-error: true # Perf test failure doesn't fail the job if accuracy passed
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-perf-8-gpu-deepseek-v32-basic --nightly --timeout-per-file 5400 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# 8-GPU DeepSeek-V3.2 MTP (MTP Accuracy + Perf)
nightly-8-gpu-deepseek-v32-mtp:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-8-gpu-deepseek-v32-mtp')
runs-on: linux-mi325-gpu-8
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- name: Setup docker
run: |
touch github_summary.md
bash scripts/ci/amd/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd/amd_ci_install_dependency.sh

- name: Accuracy Test (8-GPU DeepSeek-V3.2 MTP)
timeout-minutes: 120
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-deepseek-v32-mtp --nightly --timeout-per-file 3600 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

- name: Performance Test (8-GPU DeepSeek-V3.2 MTP)
timeout-minutes: 180
continue-on-error: true # Perf test failure doesn't fail the job if accuracy passed
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-perf-8-gpu-deepseek-v32-mtp --nightly --timeout-per-file 7200 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# 8-GPU Kimi-K2 (Accuracy + Speed)
nightly-8-gpu-kimi-k2:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-8-gpu-kimi-k2')
runs-on: linux-mi325-gpu-8
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- name: Setup docker
run: |
touch github_summary.md
bash scripts/ci/amd/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd/amd_ci_install_dependency.sh

- name: Accuracy Test (8-GPU Kimi-K2)
timeout-minutes: 120
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-kimi-k2 --nightly --timeout-per-file 3600 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# ============================================== MI35x Tests ==============================================
# MI35x 1-GPU tests - platform-agnostic tests that may work on CDNA4 (gfx950)
nightly-test-1-gpu-mi35x:
Expand Down Expand Up @@ -641,6 +727,39 @@ jobs:
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# MI35x 8-GPU DeepSeek-V3.2 TP+MTP Accuracy Test
nightly-accuracy-8-gpu-mi35x-deepseek-v32-mtp:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-accuracy-8-gpu-mi35x-deepseek-v32-mtp')
runs-on: linux-mi35x-gpu-8
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- name: Setup docker
run: |
touch github_summary.md
bash scripts/ci/amd/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: |
bash scripts/ci/amd/amd_ci_install_dependency.sh
# Install tabulate for run_suite.py (missing in MI35x container)
bash scripts/ci/amd/amd_ci_exec.sh pip install tabulate

- name: Accuracy Test MI35x (8-GPU DeepSeek-V3.2 TP+MTP)
timeout-minutes: 120
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-amd-accuracy-8-gpu-mi35x-deepseek-v32-mtp --nightly --timeout-per-file 3600 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

# MI35x 8-GPU DeepSeek-V3.2 Performance Test (Basic)
nightly-perf-8-gpu-mi35x-deepseek-v32-basic:
if: (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-perf-8-gpu-mi35x-deepseek-v32-basic')
Expand Down Expand Up @@ -698,12 +817,12 @@ jobs:
bash scripts/ci/amd/amd_ci_exec.sh pip install tabulate

- name: Performance Test MI35x (8-GPU DeepSeek-V3.2 MTP)
timeout-minutes: 150
timeout-minutes: 180
run: |
> github_summary.md # Clear summary file
bash scripts/ci/amd/amd_ci_exec.sh -w /sglang-checkout/test \
-e GITHUB_STEP_SUMMARY="/sglang-checkout/github_summary.md" \
python3 run_suite.py --hw amd --suite nightly-perf-8-gpu-mi35x-deepseek-v32-mtp --nightly --timeout-per-file 5400 || TEST_EXIT_CODE=$?
python3 run_suite.py --hw amd --suite nightly-perf-8-gpu-mi35x-deepseek-v32-mtp --nightly --timeout-per-file 7200 || TEST_EXIT_CODE=$?
echo "$(<github_summary.md )" >> $GITHUB_STEP_SUMMARY || true
exit ${TEST_EXIT_CODE:-0}

Expand All @@ -719,20 +838,24 @@ jobs:
- nightly-perf-2-gpu-text
- nightly-perf-2-gpu-vlm
- nightly-accuracy-8-gpu
- nightly-accuracy-8-gpu-deepseek-r1
# MI30x Combined Accuracy + Performance Tests
- nightly-8-gpu-grok1-int4
- nightly-8-gpu-grok2
- nightly-8-gpu-deepseek-v31
- nightly-8-gpu-deepseek-v32
- nightly-8-gpu-deepseek-v32-mtp
- nightly-8-gpu-kimi-k2
# MI35x jobs
- nightly-test-1-gpu-mi35x
- nightly-accuracy-8-gpu-mi35x
- nightly-8-gpu-mi35x-grok1-int4
- nightly-8-gpu-mi35x-grok2
- nightly-8-gpu-mi35x-deepseek-r1-mxfp4
- nightly-accuracy-8-gpu-mi35x-deepseek-v32
- nightly-perf-8-gpu-mi35x-deepseek-v32-basic
- nightly-perf-8-gpu-mi35x-deepseek-v32-mtp
- nightly-accuracy-8-gpu-mi35x-deepseek-v32-mtp
# MI35x perf jobs excluded from check - perf failures don't block CI
# - nightly-perf-8-gpu-mi35x-deepseek-v32-basic
# - nightly-perf-8-gpu-mi35x-deepseek-v32-mtp
runs-on: ubuntu-latest
steps:
- name: Check if any job failed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@ def test_deepseek_r1_accuracy(self):
)
passed = acc >= config.accuracy_threshold
status = "✅ PASS" if passed else "❌ FAIL"
print(
f" accuracy={acc:.3f} threshold={config.accuracy_threshold} {status}"
)

all_results.append(
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,9 @@ def test_deepseek_r1_mxfp4_accuracy(self):
)
passed = acc >= config.accuracy_threshold
status = "✅ PASS" if passed else "❌ FAIL"
print(
f" accuracy={acc:.3f} threshold={config.accuracy_threshold} {status}"
)

all_results.append(
{
Expand Down
Loading
Loading