Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
5313205
[NPU] Support GLM-4.7-Flash on NPU (#153)
Estrella-xx Mar 27, 2026
70301c5
[NPU] recover accuracy for gemma3-4b-it from 54% to 72% (reduced by t…
McZyWu Mar 28, 2026
822138b
Br fix qwen2 5 ascend (#159)
amote-i Mar 28, 2026
2720785
add documentation for GLM-4.7-Flash on Ascend (#162)
Estrella-xx Mar 28, 2026
6c54ed0
Revert "Use LazyValue for routed_experts_weights_of_layer initializat…
longxin9715 Mar 28, 2026
2b186af
fix(grok): fallback to standard weight loading when no presharded fil…
Hide-on-bushsh Mar 30, 2026
34f3a2c
revert: revert qwen3_5.py, use separate layers (#172)
iridiumine Mar 30, 2026
8df71a8
[bugfix]GLM-4V model (#176)
Hide-on-bushsh Mar 30, 2026
6d57e7c
NPU can use piece cuda graph when the piece cuda graph is explicitly …
chx96642264 Mar 30, 2026
dbcfc31
Bug fix for llama eagle3 (#177)
khalil2ji3mp6 Mar 30, 2026
30aa792
fix eagle3 accept rate (#179)
heziiop Mar 30, 2026
841f4cd
Support MTP for Qwen3.5 (#154)
iridiumine Mar 30, 2026
230a528
fix bug (#181)
longxin9715 Mar 30, 2026
8de1b25
revert pr 19321 for accuracy temporarily (#178)
McZyWu Mar 31, 2026
e36779a
Bug fix for not import is npu (#182)
McZyWu Mar 31, 2026
cf463fb
Revert "Revert "Use LazyValue for routed_experts_weights_of_layer ini…
iridiumine Mar 31, 2026
b28eeff
fix: qwen3.5 precision & quant model load error (#191)
iridiumine Apr 1, 2026
2e7fbff
[NPU] change fused_qkvzba_split_reshape_cat_npu to fused_qkvzba_split…
iridiumine Apr 1, 2026
61c384a
[NPU] Use causal_conv1d and fix qwen-next modelslim (#207)
iridiumine Apr 2, 2026
8ec5072
merge sgl-project main (#221)
cen121212 Apr 3, 2026
5bf5c4a
BugFix for MLAPO for Deepseek eagle3 on Ascend (#222)
khalil2ji3mp6 Apr 3, 2026
bb5c386
adapt mtp + prefix for ascend gdn backend (#202)
silencejade Apr 6, 2026
c4366c1
Minimax 2.5 optimization (#237)
shadowxz109 Apr 7, 2026
fae90ab
Move ring test to nightly (#22267)
ispobock Apr 7, 2026
e7bc23c
[diffusion] CI: fix consistency check (#22251)
mickqian Apr 7, 2026
5ae00ec
[Disagg][NIXL] Support Mamba state slice transfer for heterogeneous T…
YAMY1234 Apr 7, 2026
727a182
[Mamba] eliminate D2H if tracking mamba states (#20522)
Henson-Zh-Ali Apr 7, 2026
ec5742f
fix: Auto-correct page_size for Mamba no_buffer radix cache mode (#20…
alphabetc1 Apr 7, 2026
be42fbb
Support HTTP2 server (#21700)
ispobock Apr 7, 2026
6131fb5
[NPU] enable mla prepare fused kernel only when being mla attn (#22024)
khalil2ji3mp6 Apr 7, 2026
0c204fb
[HiSparse] Optimize the scheduling of decode backup. (#21932)
huangtingwei9988 Apr 7, 2026
1a8eb89
Kernels community fa3 (#20796)
rainj-me Apr 7, 2026
cc35714
[tiny] migrate /get_server_info; print accept length in accuracy test…
hnyls2002 Apr 7, 2026
e148767
[AMD] Fix test_kimi_k25_mxfp4.py : stage-c-test-large-8-gpu-amd-mi35x…
yctseng0211 Apr 7, 2026
f08726f
[Feature] Add DFLASH speculative decoding support (#22077)
dcw02 Apr 7, 2026
671fe73
Reduce unnecessary kernels and copies in the NSA indexer (#22232)
1am9trash Apr 7, 2026
e665230
[CI] Update nightly test models for H200/B200 (#22288)
Kangyan-Zhou Apr 7, 2026
0e2a026
Add fast-fail to multimodal-gen CI (#22284)
hnyls2002 Apr 7, 2026
7546d04
[NVIDIA] Enable FP4 flashinfer trtllm routed moe (#21240)
trevor-m Apr 7, 2026
f6fc395
[CI] Migrate mgsm_en eval to gsm8k to remove openaipublic dependency …
dougyster Apr 7, 2026
dd73e9a
Revert "[CI] Update nightly test models for H200/B200 (#22288)" (#22297)
Kangyan-Zhou Apr 8, 2026
8c3d80e
Only upload CUDA coredumps on test failure (#22301)
hnyls2002 Apr 8, 2026
919df92
Merge branch 'release/PoC_20260331' of https://github.com/Ascend/sgla…
khalil2ji3mp6 Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 41 additions & 13 deletions .github/workflows/diffusion-ci-gt-gen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ permissions:
contents: write
actions: read

env:
SGLANG_IS_IN_CI: true
SGLANG_CUDA_COREDUMP: "1"

jobs:
multimodal-diffusion-gen-1gpu:
if: github.repository == 'sgl-project/sglang'
Expand All @@ -40,6 +44,8 @@ jobs:
run: bash scripts/ci/cuda/ci_install_dependency.sh diffusion

- name: Generate outputs
env:
RUNAI_STREAMER_MEMORY_LIMIT: 0
run: |
cd python
python -m sglang.multimodal_gen.test.scripts.gen_diffusion_ci_outputs \
Expand All @@ -56,6 +62,11 @@ jobs:
path: python/diffusion-ci-outputs
retention-days: 7

- name: Publish GT images to sglang-bot/sglang-ci-data
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
run: python scripts/ci/utils/diffusion/publish_diffusion_gt.py --source-dir python/diffusion-ci-outputs

multimodal-diffusion-gen-2gpu:
if: github.repository == 'sgl-project/sglang'
runs-on: 2-gpu-h100
Expand All @@ -73,6 +84,8 @@ jobs:
run: bash scripts/ci/cuda/ci_install_dependency.sh diffusion

- name: Generate outputs
env:
RUNAI_STREAMER_MEMORY_LIMIT: 0
run: |
cd python
python -m sglang.multimodal_gen.test.scripts.gen_diffusion_ci_outputs \
Expand All @@ -89,27 +102,42 @@ jobs:
path: python/diffusion-ci-outputs
retention-days: 7

diffusion-ci-push:
needs: [multimodal-diffusion-gen-1gpu, multimodal-diffusion-gen-2gpu]
- name: Publish GT images to sglang-bot/sglang-ci-data
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
run: python scripts/ci/utils/diffusion/publish_diffusion_gt.py --source-dir python/diffusion-ci-outputs

multimodal-diffusion-gen-b200:
if: github.repository == 'sgl-project/sglang'
runs-on: ubuntu-latest
runs-on: 4-gpu-b200
timeout-minutes: 240
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Download artifacts
uses: actions/download-artifact@v4
with:
pattern: diffusion-gen-*
path: combined
merge-multiple: true
ref: ${{ inputs.ref || github.ref }}

- name: Install dependencies
run: bash scripts/ci/cuda/ci_install_dependency.sh diffusion

- name: Collect image files
- name: Generate outputs
env:
RUNAI_STREAMER_MEMORY_LIMIT: 0
run: |
mkdir -p gt_images
find combined \( -name "*.png" -o -name "*.jpg" -o -name "*.jpeg" -o -name "*.webp" \) -type f -exec cp -f {} gt_images/ \;
cd python
python -m sglang.multimodal_gen.test.scripts.gen_diffusion_ci_outputs \
--suite 1-gpu-b200 \
--out-dir ./diffusion-ci-outputs \
${{ inputs.case_ids != '' && format('--case-ids {0}', inputs.case_ids) || '' }}

- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: diffusion-gen-b200
path: python/diffusion-ci-outputs
retention-days: 7

- name: Publish GT images to sglang-bot/sglang-ci-data
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
run: python scripts/ci/utils/diffusion/publish_diffusion_gt.py --source-dir gt_images
run: python scripts/ci/utils/diffusion/publish_diffusion_gt.py --source-dir python/diffusion-ci-outputs
32 changes: 16 additions & 16 deletions .github/workflows/nightly-test-nvidia.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-1-gpu --nightly --continue-on-error

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# JIT kernel full unit tests (expanded parameter ranges via SGLANG_JIT_KERNEL_RUN_FULL_TESTS)
nightly-test-kernel-1-gpu-h100:
Expand Down Expand Up @@ -110,7 +110,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-kernel-1-gpu --nightly --continue-on-error

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

nightly-test-kernel-8-gpu-h200:
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-kernel-8-gpu-h200')
Expand Down Expand Up @@ -140,7 +140,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-kernel-8-gpu-h200 --nightly --continue-on-error

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# General tests - 4 GPU H100
nightly-test-general-4-gpu-h100:
Expand All @@ -165,7 +165,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-4-gpu --nightly --continue-on-error

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# General tests - 8 GPU H200
nightly-test-general-8-gpu-h200:
Expand Down Expand Up @@ -249,7 +249,7 @@ jobs:
if-no-files-found: ignore

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.partition }}

Expand Down Expand Up @@ -280,7 +280,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-8-gpu-h20 --nightly --continue-on-error

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# General tests - 8 GPU B200
nightly-test-general-8-gpu-b200:
Expand Down Expand Up @@ -353,7 +353,7 @@ jobs:
if-no-files-found: ignore

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.partition }}

Expand All @@ -380,7 +380,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-eval-text-2-gpu --nightly --continue-on-error --timeout-per-file 4500

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# Text model performance tests
nightly-test-text-perf-2-gpu-h100:
Expand Down Expand Up @@ -418,7 +418,7 @@ jobs:
python3 scripts/ci/utils/publish_traces.py --traces-dir test/performance_profiles_text_models

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# VLM accuracy tests
nightly-test-vlm-accuracy-2-gpu-h100:
Expand All @@ -443,7 +443,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-eval-vlm-2-gpu --nightly --continue-on-error --timeout-per-file 9000

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# VLM performance tests
nightly-test-vlm-perf-2-gpu-h100:
Expand Down Expand Up @@ -481,7 +481,7 @@ jobs:
python3 scripts/ci/utils/publish_traces.py --traces-dir test/performance_profiles_vlms

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# diffusion performance tests
nightly-test-multimodal-server-1-gpu:
Expand Down Expand Up @@ -538,7 +538,7 @@ jobs:
if-no-files-found: ignore

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -596,7 +596,7 @@ jobs:
if-no-files-found: ignore

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand All @@ -623,7 +623,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-4-gpu-b200 --nightly --continue-on-error --timeout-per-file 12000

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# Specialized B200 tests - 8 GPU, for specific backends and configs
nightly-test-specialized-8-gpu-b200:
Expand Down Expand Up @@ -652,7 +652,7 @@ jobs:
python3 run_suite.py --hw cuda --suite nightly-8-gpu-b200 --nightly --continue-on-error --timeout-per-file 2400

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# Diffusion cross-framework comparison
nightly-test-diffusion-comparison:
Expand Down Expand Up @@ -716,7 +716,7 @@ jobs:
if-no-files-found: ignore

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

# Consolidate performance metrics from all jobs
consolidate-metrics:
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/pr-test-multimodal-gen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ jobs:
$CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -155,7 +155,7 @@ jobs:
$CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand All @@ -175,6 +175,7 @@ jobs:
with:
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}

- uses: ./.github/actions/check-stage-health

- uses: ./.github/actions/check-maintenance

Expand Down Expand Up @@ -203,7 +204,7 @@ jobs:
$CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

multimodal-gen-unit-test:
if: |
Expand Down
24 changes: 12 additions & 12 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-a-test-1-gpu-small $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

stage-a-test-cpu:
needs: [check-changes, call-gate]
Expand Down Expand Up @@ -711,7 +711,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-b-test-1-gpu-small --auto-partition-id ${{ matrix.partition }} --auto-partition-size 8 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.partition }}

Expand Down Expand Up @@ -767,7 +767,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-b-test-1-gpu-large --auto-partition-id ${{ matrix.partition }} --auto-partition-size 14 --timeout-per-file 1800 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.partition }}

Expand Down Expand Up @@ -822,7 +822,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-b-test-2-gpu-large --auto-partition-id ${{ matrix.partition }} --auto-partition-size 4 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.partition }}

Expand Down Expand Up @@ -880,7 +880,7 @@ jobs:
python3 -m pytest -q python/sglang/jit_kernel/tests/test_flash_attention_4.py

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

call-multimodal-gen-tests:
needs: [check-changes, call-gate, sgl-kernel-build-wheels]
Expand Down Expand Up @@ -962,7 +962,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-4-gpu-h100 --auto-partition-id ${{ matrix.part }} --auto-partition-size 3 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -1030,7 +1030,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-8-gpu-h200 --auto-partition-id ${{ matrix.part }} --auto-partition-size 4 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -1086,7 +1086,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-8-gpu-h20 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -1148,7 +1148,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-deepep-4-gpu-h100 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

stage-c-test-deepep-8-gpu-h200:
needs: [check-changes, call-gate, wait-for-stage-b]
Expand Down Expand Up @@ -1209,7 +1209,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-deepep-8-gpu-h200 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

stage-c-test-4-gpu-b200:
needs: [check-changes, call-gate, wait-for-stage-b]
Expand Down Expand Up @@ -1262,7 +1262,7 @@ jobs:
python3 run_suite.py --hw cuda --suite stage-c-test-4-gpu-b200 --auto-partition-id ${{ matrix.part }} --auto-partition-size 4 --timeout-per-file 1800 $CONTINUE_ON_ERROR_FLAG

- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()
with:
artifact-suffix: ${{ matrix.part }}

Expand Down Expand Up @@ -1316,7 +1316,7 @@ jobs:
# python3 run_suite.py --hw cuda --suite stage-c-test-4-gpu-gb200 --timeout-per-file 3600 $CONTINUE_ON_ERROR_FLAG
#
# - uses: ./.github/actions/upload-cuda-coredumps
# if: always()
# if: failure()

pr-test-finish:
needs:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/rerun-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ jobs:
echo "All $total test(s) passed in ${total_elapsed}s"
- uses: ./.github/actions/upload-cuda-coredumps
if: always()
if: failure()

rerun-test-cpu:
if: inputs.is_cpu == 'true'
Expand Down Expand Up @@ -173,4 +173,4 @@ jobs:
echo ""
done
total_elapsed=$(( SECONDS - suite_start ))
echo "All $total test(s) passed in ${total_elapsed}s"
echo "All $total test(s) passed in ${total_elapsed}s"
Loading
Loading