Sandbox: verify full main CI is green on latest main (do not merge)#25647
Sandbox: verify full main CI is green on latest main (do not merge)#25647fzyzcjy wants to merge 1 commit into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
CI failure:
|
CI failure:
|
CI failure:
|
|
/rerun-test test/registered/spec/eagle/test_eagle_infer_b.py |
|
🚀 |
|
/rerun-test test/registered/lora/test_lora_qwen3_8b_logprob_diff.py |
|
/rerun-test test/registered/core/test_srt_endpoint.py |
|
🚀 |
|
🚀 |
|
|
| File | Original lane | Rerun verdict |
|---|---|---|
test/registered/spec/eagle/test_eagle_infer_b.py (test_radix_attention) |
base-b-test-1-gpu-large (1) |
✅ PASS — flake |
test/registered/core/test_srt_endpoint.py (test_get_server_info_concurrent) |
base-b-test-1-gpu-small (5) |
✅ PASS — flake |
test/registered/lora/test_lora_qwen3_8b_logprob_diff.py (test_lora_qwen3_8b_logprob_accuracy) |
extra-a-test-1-gpu-large (0) |
❌ FAIL same CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS during CUDA graph capture — real bug (bisecting next) |
Bisect probe:
|
Bisect probes:
|
| SHA | Date | Subject | rerun-test verdict |
|---|---|---|---|
ba214ef3d3 |
2026-05-14 | tag-gated nightly migration — 40 whole-file moves | PASS |
229cadec04 |
2026-05-16 | logging update for inplace setting in MoE layer | PASS |
c58b47bc86 |
2026-05-18 | PoolStats dataclass move | (in flight) |
d90bc65e30 |
2026-05-19 | [NPU] Fix TypeError in MLA index_head_dim |
FAIL |
| current HEAD | 2026-05-19 | (Tom's chain + 5 unrelated) | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
ba214ef3d3 |
2026-05-14 | PASS |
229cadec04 |
2026-05-16 | PASS |
c58b47bc86 |
2026-05-18 | PASS ✅ |
f04c522534 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
ba214ef3d3 |
2026-05-14 | PASS |
229cadec04 |
2026-05-16 | PASS |
c58b47bc86 |
2026-05-18 | PASS |
f04c522534 |
2026-05-18 | PASS ✅ |
f5049709b3 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
f5049709b3 |
2026-05-18 | PASS ✅ (last good lower bound) |
878e6b8886 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL (first bad upper bound) |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
878e6b8886 |
2026-05-18 | PASS ✅ (last good) |
b79e4b1e68 |
2026-05-18 | (in flight — prime suspect) |
d90bc65e30 |
2026-05-19 | FAIL (first bad) |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
878e6b8886 |
2026-05-18 | PASS ✅ (last good) |
745abd6cc0 |
2026-05-18 | (untested) |
314dedf7c6 |
2026-05-18 | (in flight) |
b79e4b1e68 |
2026-05-18 | FAIL ❌ (first bad upper bound) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect result:
|
| SHA | Date | Subject | Verdict |
|---|---|---|---|
ba214ef3d3 |
2026-05-14 | tag-gated nightly migration — 40 whole-file moves | PASS |
229cadec04 |
2026-05-16 | logging update for inplace setting in MoE layer | PASS |
c58b47bc86 |
2026-05-18 | PoolStats dataclass move | PASS |
f04c522534 |
2026-05-18 | [PD] Add conclude_state to fake KV backend | PASS |
f5049709b3 |
2026-05-18 | eagle3 aux-layer-ids +1 offset fix | PASS |
878e6b8886 |
2026-05-18 | [SP] Fix runtime_max_tokens_per_rank | PASS |
314dedf7c6 |
2026-05-18 | Use SGLANG_CACHE_DIR env for gpu_p2p_access_cache path | PASS ✅ (last good) |
b79e4b1e68 |
2026-05-18 | [Fix] Try to fix error caused by latest cutedsl packages (#25690) | FAIL ❌ (first bad) |
d90bc65e30 |
2026-05-19 | [NPU] Fix TypeError in MLA index_head_dim |
FAIL |
| current HEAD | 2026-05-19 | (Tom's chain + a handful of unrelated) | FAIL |
Offending change
- PR: [Fix] Try to fix error caused by latest cutedsl packages #25690 — [Fix] Try to fix error caused by latest cutedsl packages
- Author: @Fridge003 (Co-authored-by @hnyls2002)
- Merged: 2026-05-18 23:51 UTC
- Diff: 21 +, 4 -. Touches
python/pyproject.toml(switchesflashinfer_pythonandnvidia-cutlass-dslto the[cu13]extras variant) andscripts/ci/cuda/ci_install_dependency.sh(regex-update for[extras]notation + newpurge_cutlass_libs_base()step that uninstallsnvidia-cutlass-dsl-libs-basethen force-reinstallsnvidia-cutlass-dsl-libs-cu13).
The PR's own commit message explains the original bug it was fixing:
nvidia-cutlass-dsl[cu13] extras are additive on PyPI: requires_dist always pulls -libs-base AND -libs-cu13 when [cu13] is requested. Both wheels write to the same site-packages paths with different content, leaving the wrapper (cutlass.py, cu13 style) mismatched with the binding (_gpu_ops_gen.py, base style) -> GPUModuleOp signature TypeError.
The fix correctly purges -libs-base in the install script, but the LoRA Qwen3-8B forward path with CUDA graph capture now hits a kernel-side illegal address — so either the cu13 wheel's compiled kernel is broken for this path, or the purge_cutlass_libs_base step doesn't actually win in all install orderings.
Failure fingerprint (every FAIL probe + current HEAD)
coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
Fatal Python error: Aborted
RuntimeError: Rank 0 scheduler died during initialization (exit code: -6).
Python call stack at the abort thread:
File ".../python/sglang/srt/layers/quantization/unquant.py", line 161 in apply
File ".../python/sglang/srt/lora/layers.py", line 724 in forward
...
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 1112 in run_once
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 1134 in capture_one_batch_size
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 707 in __init__
File ".../python/sglang/srt/model_executor/model_runner.py", line 2776 in init_device_graphs
Reproduce
# Probe latest good (PASS):
git push upstream 314dedf7c6:refs/heads/tmp-good
gh workflow run rerun-test.yml --repo sgl-project/sglang --ref tmp-good \
-f mode=cuda -f test_command="registered/lora/test_lora_qwen3_8b_logprob_diff.py" \
-f runs_on="1-gpu-h100" -f install_script="scripts/ci/cuda/ci_install_dependency.sh"
# Probe first bad (FAIL):
git push upstream b79e4b1e68:refs/heads/tmp-bad
gh workflow run rerun-test.yml --repo sgl-project/sglang --ref tmp-bad \
-f mode=cuda -f test_command="registered/lora/test_lora_qwen3_8b_logprob_diff.py" \
-f runs_on="1-gpu-h100" -f install_script="scripts/ci/cuda/ci_install_dependency.sh"cc @Fridge003 @hnyls2002 — could you take a look? This regression has been on main since 2026-05-18 and is currently surfacing as extra-a-test-1-gpu-large (0) on the main-CI sandbox.
Diagnostic revert PR opened for verification: #25743 — /rerun-test of the failing LoRA file is pending there.
Bisect confirmed via paired diagnostic PRsTwo sibling PRs were opened to nail down
Together with the per-commit bisect probes above, that's three independent lines of evidence:
The regression is unambiguously cc @Fridge003 @hnyls2002 — could you take a look? Closing the two diagnostic PRs now. |
2×2 paired probe — both runs match expectationA second
4-of-4 consistent with the bisect conclusion. |
4a45112 to
c6e27e0
Compare
CUDA-lane failure: borderline GSM8K accuracy (likely flake)
Other failing lanes are non-CUDA and not chased per lane policy (main-sandbox, no diff): Next: rerunning this single file to confirm the flake. |
|
/rerun-test test/registered/spec/test_gemma4_mtp_31b_extra.py |
|
Results for 🚀 |
Flake confirmed:
|
Non-CUDA lane:
|
✅ CUDA gate GREEN — main verification completeHead SHA The only CUDA red was a confirmed flake:
Remaining red lanes are non-gating (non-CUDA / chronic / cascade), none related to the landed chain:
Conclusion: the KV-canary feature, landed on |
c6e27e0 to
96c5c6e
Compare
Round status (head
Remaining ~95 jobs still running; will batch any reruns after the round lands. |
Other reds this round: AMD lane (27 jobs — ongoing repo-wide AMD outage), NPU a2 (recurring perf flake), XPU (chronic runner infra). None CUDA, none code-related. Plan: wait for the ~13 still-running jobs to land, then |
Round summary (running=0): CUDA reds = this + Next: one batched |
|
/rerun-failed-ci |
96c5c6e to
ffbe2e8
Compare
stage-a-test-1-gpu-xpu / finish (job): runner-level infra failure during workspace cleanup, before any test ran: Classification: infra (self-hosted XPU runner permission residue), non-CUDA lane, unrelated to main's code. Not chasing per babysit policy; CUDA lanes remain the hard gate. |
Non-CUDA failures (not chasing per babysit policy — none CUDA, none related to the merged code):
The merged PRs touch no XPU/NPU/AMD/Xeon code, no sampling backends, no quantization or fused-residual kernels. Continuing to watch CUDA lanes (the hard gate) to completion. |
CUDA failure:
|
| Branch | Run | test_mimo_v2 | Fingerprint |
|---|---|---|---|
sandbox (main 0a190d1c9 + sentinel) |
27088945685 | ✗ FAIL | VocabParallelEmbedding input id out of range |
main scheduled (a07d813ec, independent runner) |
27091400009 | ✗ FAIL | byte-identical |
main pre-#27445/#27446 (a39c428d3) |
27093698014 (probe dispatched) | pending | — |
Classification
Pre-existing main regression, deterministic (2/2 independent runs), unrelated to #27445/#27446: the merged PRs touch only scripted-runtime test harness files and PP idle-gating in is_fully_idle (short-circuited at pp_size==1; this server is pp=1). The failing path is the model-side out-of-range-token-id async assert (same family as the tp=1 fix in #27482) on MiMo-V2.5's first warmup forward. Will report the pre-merge probe result when it completes.
Summary
Sandbox PR — do not merge. Touches
python/sglang/version.pywith a no-op comment so paths-filter flipsmain_package=trueand the full PR Test Base + PR Test Extra matrix dispatches.Carries three labels so the workflow gates all pass:
run-cipr-gate.yml'srequire-run-cigaterun-ci-extrapr-test-extra.ymlto run on thispull_requesteventbypass-fastfailcheck-pr-test-healthaction no-op (no cascade fast-fail when a single sibling fails on infra flake)Purpose: verify upstream/main (
f04c522534) is green end-to-end with the full CI surface (base stages + extra stages, no fast-fail cascade). This is the PR-side equivalent of the dispatched main CI; cleaner thangh workflow runbecause the dispatch interface cannot passskip_pr_test_health_check.Close this PR after the run completes — no source change is intended to land.
Test plan
pre-commit run --files python/sglang/version.pycheck-pr-test-healthcascade failuresCI States
Latest PR Test (Base): ❌ Run #27088945685
Latest PR Test (Extra): ✅ Run #27088945624