[SPEC V2] fix: skip stale state updates in spec-v2 overlap by alphabetc1 · Pull Request #23456 · sgl-project/sglang

alphabetc1 · 2026-04-22T08:16:43Z

Motivation

In spec-v2 overlap scheduling, decode results can arrive after a request has already
finished or been retracted. The old post-processing still applied accept_lens and
speculative acceptance accounting (spec_verify_ct, accepted draft tokens,
spec_accepted_tokens) to those stale requests, corrupting KV bookkeeping and per-request
speculative metrics.
This change skips stale state updates and only rolls back the pre-
claimed bonus slot for finished requests.

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request modifies the speculative token resolution logic in scheduler_output_processor_mixin.py to properly handle retracted and finished requests, ensuring kv_committed_len is correctly managed. The review feedback suggests updating the global num_accepted_tokens metric when requests are retracted or finished to maintain accurate batch-level speculative metrics and prevent inflated efficiency statistics.

gemini-code-assist · 2026-04-22T08:20:14Z

+            if req.is_retracted:
+                # reset_for_retract() already zeroes committed/allocated KV.
+                continue


When skipping a retracted request, its contribution to the global result.num_accepted_tokens (calculated at line 358) should also be removed. This ensures that the batch-level speculative metrics (used in update_spec_metrics and report_decode_stats) accurately reflect only the tokens that were actually committed to active requests, maintaining consistency between global and per-request statistics.

Suggested change

if req.is_retracted:

# reset_for_retract() already zeroes committed/allocated KV.

continue

if req.is_retracted:

# reset_for_retract() already zeroes committed/allocated KV.

result.num_accepted_tokens -= result.accept_length_per_req_cpu[i]

continue

gemini-code-assist · 2026-04-22T08:20:15Z

+            if req.finished():
+                # -1 because prepare_for_decode pre-claimed the bonus slot.
+                req.kv_committed_len -= 1
+                continue


Similarly to retracted requests, when a request is already finished, its accepted tokens should be excluded from the global result.num_accepted_tokens to ensure that speculative decoding efficiency metrics are not inflated by stale results.

Suggested change

if req.finished():

# -1 because prepare_for_decode pre-claimed the bonus slot.

req.kv_committed_len -= 1

continue

if req.finished():

# -1 because prepare_for_decode pre-claimed the bonus slot.

req.kv_committed_len -= 1

result.num_accepted_tokens -= result.accept_length_per_req_cpu[i]

continue

Qiaolin-Yu · 2026-04-29T01:24:50Z

/tag-and-rerun-ci

alphabetc1 · 2026-05-06T16:15:14Z

/rerun-group spec

github-actions · 2026-05-06T16:16:47Z

✅ 1-gpu-5090 (4 tests): View workflow run

cd test/ && python3 registered/spec/dflash/test_dflash.py
cd test/ && python3 registered/spec/eagle/test_eagle3_basic.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py
cd test/ && python3 registered/spec/utils/test_build_eagle_tree.py

❌ registered/spec/eagle/test_adaptive_speculative.py, registered/spec/eagle/test_eagle_constrained_decoding.py, registered/spec/eagle/test_eagle_infer_a.py, registered/spec/eagle/test_eagle_infer_b.py, registered/spec/test_ngram_speculative_decoding.py, registered/spec/test_standalone_speculative_decoding.py: Dispatch failed: 422

✅ 4-gpu-b200 (2 tests): View workflow run

cd test/ && python3 registered/spec/eagle/test_deepseek_v3_fp4_mtp_small.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta_dp_attention.py

✅ 4-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_dp_attention.py

✅ 8-gpu-b200 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta_dp_attention_large.py

✅ 2-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_constrained_decoding_spec_reasoning.py

alphabetc1 · 2026-05-06T16:29:56Z

/rerun-group spec

github-actions · 2026-05-06T16:32:30Z

✅ 1-gpu-5090 (4 tests): View workflow run

cd test/ && python3 registered/spec/dflash/test_dflash.py
cd test/ && python3 registered/spec/eagle/test_eagle3_basic.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py
cd test/ && python3 registered/spec/utils/test_build_eagle_tree.py

✅ 1-gpu-h100 (6 tests): View workflow run

cd test/ && python3 registered/spec/eagle/test_adaptive_speculative.py
cd test/ && python3 registered/spec/eagle/test_eagle_constrained_decoding.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py
cd test/ && python3 registered/spec/test_standalone_speculative_decoding.py

✅ 4-gpu-b200 (2 tests): View workflow run

cd test/ && python3 registered/spec/eagle/test_deepseek_v3_fp4_mtp_small.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta_dp_attention.py

✅ 4-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_dp_attention.py

✅ 8-gpu-b200 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta_dp_attention_large.py

✅ 2-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_constrained_decoding_spec_reasoning.py

alphabetc1 · 2026-05-07T04:44:31Z

/rerun-test test_mimo_models.py test_step3p5_flash_chain_mtp.py

github-actions · 2026-05-07T06:03:12Z

✅ 8-gpu-h200 (2 tests): View workflow run

cd test/ && python3 registered/8-gpu-models/test_mimo_models.py
cd test/ && python3 registered/8-gpu-models/test_step3p5_flash_chain_mtp.py

alphabetc1 · 2026-05-07T12:30:38Z

ci passed:
https://github.com/sgl-project/sglang/actions/runs/25465339239/job/74777086303?pr=23456

* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py

[SPEC V2] fix: skip stale state updates in spec-v2 overlap

46c995b

alphabetc1 requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners April 22, 2026 08:16

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Qiaolin-Yu self-assigned this Apr 22, 2026

Qiaolin-Yu approved these changes Apr 29, 2026

View reviewed changes

Merge branch 'main' into fix/spec_v2_fix

b9ebaa0

github-actions Bot added the run-ci label Apr 29, 2026

Qiaolin-Yu mentioned this pull request Apr 29, 2026

[SPEC V2][2/N] feat: adaptive spec support spec v2 #23336

Merged

5 tasks

Merge branch 'main' into fix/spec_v2_fix

c2c2274

Merge branch 'main' into fix/spec_v2_fix

0db112b

Merge branch 'main' into fix/spec_v2_fix

5c9d4b9

Qiaolin-Yu merged commit b4d347e into sgl-project:main May 10, 2026
209 of 224 checks passed

alphabetc1 deleted the fix/spec_v2_fix branch May 10, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPEC V2] fix: skip stale state updates in spec-v2 overlap#23456

[SPEC V2] fix: skip stale state updates in spec-v2 overlap#23456
Qiaolin-Yu merged 5 commits intosgl-project:mainfrom
alphabetc1:fix/spec_v2_fix

alphabetc1 commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Qiaolin-Yu commented Apr 29, 2026

Uh oh!

alphabetc1 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

alphabetc1 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

alphabetc1 commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

alphabetc1 commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alphabetc1 commented Apr 22, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Qiaolin-Yu commented Apr 29, 2026

Uh oh!

alphabetc1 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

alphabetc1 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

alphabetc1 commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

alphabetc1 commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants