[Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming by hnyls2002 · Pull Request #24081 · sgl-project/sglang

hnyls2002 · 2026-04-29T18:50:24Z

Summary

Rename external-facing meta_info JSON keys and trace_slice fields from accepted_* to correct_drafts / num_correct_drafts, matching the post-rename internal convention.
Keep the old names as backward-compat aliases tagged # FIXME: backward-compat alias, remove in next release.

Background

Per the speculative-naming rule (.claude/rules/speculative-naming.md), accept_* means "with bonus" and correct_* means "drafts only" (no bonus).
The old external keys / params (spec_accepted_drafts, spec_proposed_drafts, spec_accept_histogram, accepted_tokens=) all carried drafts-only counts but used the accept_* root — semantically misleading.
Internal rename already landed in [Spec] Internal rename per N2 v2 naming rule #25014, [Spec] Mamba scatter cleanup; fix multi-layer positional bug; dflash naming #25029, [Spec] Multi-layer mamba scatter cleanup; fix positional call bug #25030. This PR finishes the external API surface.

Changes

meta_info JSON keys (tokenizer_manager.py)

Primary: spec_num_correct_drafts, spec_num_proposed_drafts, spec_correct_drafts_histogram
Aliases (FIXME): spec_accepted_drafts, spec_proposed_drafts, spec_accept_histogram

set_spec_verify_end_time(...) (req_time_stats.py)

Primary kwarg: num_correct_drafts=
Alias kwarg (FIXME): accepted_tokens= — copied into num_correct_drafts when provided
trace_slice writes both "num_correct_drafts" (new) and "accepted_tokens" (alias, FIXME) keys

Callers updated to new kwarg

eagle_worker.py, ngram_worker.py, frozen_kv_mtp_worker.py

Out of scope

meta_info["spec_accept_rate"] / meta_info["spec_accept_length"] and Prometheus sglang:spec_accept_{rate,length} — paper-aligned (Leviathan α / EAGLE τ), unchanged per Rule 3 exception.

TODO (follow-up)

Drop the four backward-compat aliases in the next release:
- meta_info["spec_accepted_drafts"]
- meta_info["spec_proposed_drafts"]
- meta_info["spec_accept_histogram"]
- accepted_tokens= kwarg + "accepted_tokens" trace_slice key
Rename accept_token_num kwarg in sgl-kernel C++ op schema (tree_speculative_sampling_target_only, verify_tree_greedy_func) — currently misleading per Rule 3 (kernel writes drafts-only). Requires a new sgl-kernel wheel.

Follows up on #25014, #25029, #25030.

gemini-code-assist · 2026-04-29T18:50:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hnyls2002 · 2026-05-12T04:43:29Z

/rerun-test test_eagle_infer_a.py test_eagle_infer_b.py test_eagle_infer_beta.py test_eagle_constrained_decoding.py test_dflash.py test_ngram_speculative_decoding.py

github-actions · 2026-05-12T04:43:55Z

🚀 1-gpu-h100 (4 tests): ✅ View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py
cd test/ && python3 registered/spec/eagle/test_eagle_constrained_decoding.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

🚀 1-gpu-5090 (2 tests): ✅ View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py
cd test/ && python3 registered/spec/dflash/test_dflash.py

…aming (sgl-project#24081)

…ack) Brings in upstream sgl-project/sglang main commits since 096ad02 (merge base, Laguna-XS.2 model support). Total: 28 upstream commits composed. Custom-stack files preserved intact (entirely-ours, byte-identical to origin/main): - Blackwell CuTe kernel suite (warp_decode_cute, g1_attention_cute, gated_norm_cute, layersplit_cute, fused_store_index_cache) - TurboQuant 2.5-bit dense KV cache path - HIGGS 2-bit dense KV cache path (with split-K decode) - NVFP4 IndexCache dispatcher (active gate) - quantization_config_dispatch (HF-config-driven runtime routing) - All custom server-args flags and runtime methods preserved Verification: - 200+ merged Python files compile cleanly - Dispatcher symbol presence verified - HIGGS pool / TurboQuant pool classes present at expected lines - compressed_tensors_w4a4_nvfp4_moe imports clean - All custom server-args flags present (enable_higgs_dense_2bit_kv_cache, enable_turboquant_dense_kv_cache, turboquant_dense_kv_preset, indexer_quantization_declared, higgs_mla_decode_num_splits, etc.) Manual-merged shared files (auto-merge gave broken/mixed output; cleaned up post-merge): - python/sglang/srt/disaggregation/mooncake/conn.py: upstream's PR#24932 refactored maybe_send_extra into a state-types-loop. Replayed our LayerSplit NSA state-index-length-mismatch check inside the SWA/NSA branch of the new loop body. - sgl-kernel/python/sgl_kernel/__init__.py: upstream's PR#23449 (Apple Silicon Metal kernel) wrapped the entire module body in `if darwin/arm64: from sgl_kernel.metal import * else: ...`. The auto-merge duplicated the file body; rewrote cleanly with upstream's structure and re-injected our `g1_gate_forward`, `warp_decode_cute_moe_forward`, and `warp_decode_cute_moe_packed_forward` imports plus `g1_gate_forward` in _DEBUG_EXPORT_NAMES. - python/sglang/srt/managers/scheduler_output_processor_mixin.py: line 628 still referenced `result.num_accepted_drafts` (renamed by PR sgl-project#25038 to `num_correct_drafts`). Renamed in place. - python/sglang/srt/observability/scheduler_metrics_mixin.py: a block around the spec-decode logging path had mixed old/new names from auto-merge (lines 553/557/560). Renamed `spec_num_accepted_tokens` -> `spec_num_accept_tokens` and local `num_accepted_drafts` -> `num_correct_drafts` to match the rest of the file. - test/test_smc_info.py: stub Req mock used the old field names `spec_accepted_drafts` and `update_spec_acceptance_histogram`. Renamed to `spec_num_correct_drafts` and `update_spec_correct_drafts_histogram` per PR sgl-project#24081. Auto-merge cleanly integrated upstream changes to: - server_args.py (new fields: prefill_only_disable_kv_cache, weight_loader_drop_cache_after_load, prefill_delayer_queue_min_ratio, prefill_delayer_max_delay_ms, speculative_draft_window_size, etc.) - mem_cache/memory_pool.py (new NoOpMHATokenToKVPool) - model_executor/model_runner_kv_cache_mixin.py (NoOpMHATokenToKVPool pool factory + _validate_prefill_only_disable_kv_cache_pool_family) - layers/attention/nsa_backend.py (spec rename num_accepted_drafts -> num_correct_drafts; num_accepted_tokens -> num_accept_tokens) - layers/attention/nsa/nsa_indexer.py (new _apply_q_scale_and_softmax_scale compile method; torch.mm replaces deep_gemm wrapper) - 28+ disaggregation/spec/runner files with mostly clean upstream-side-only integration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> ----- upstream commit subjects (28) ----- fd3eb77 [Cookbook]: add Laguna-XS.2 (Poolside) (sgl-project#24730) 6be1a45 Fix swa component host hit (sgl-project#25085) 693f497 [NPU] use causal_conv1d_update_v2 for performance (sgl-project#24595) 1efe9e2 [Bug Fix] Reject incompatible combination of --disable-cuda-graph-padding and --enable-torch-compile (sgl-project#23903) 8d27ce7 Optimize uvicorn startup command (sgl-project#25041) b35fd5f [fix] skip legacy minicpmv conv template for MiniCPM-V 4.6 (sgl-project#24998) 7582237 [Tiny Fix] Disable BCG when inner layer_model unresolved (sgl-project#25021) ca3bc05 Deepseek-v4-Pro share expert tp1 (sgl-project#24949) a72d3ae [Spec] Multi-layer mamba scatter cleanup; fix positional call bug (sgl-project#25030) 7128533 Revert "Migrate Intel CPU cases to the test/registered." (sgl-project#25044) 1f985c5 [Spec] Rename `accepted_indices` -> `accept_indices`; drop `_token_id` suffix per Rule 5 (sgl-project#25038) ecf5d84 Migrate Intel CPU cases to the test/registered. (sgl-project#22670) d7f4761 [PD] Refactor hybrid state transfer (sgl-project#24932) 91907b7 [UnifiedTree]: Fix Unified HiCache tombstone lock release replay (sgl-project#24972) 4ad63ad [Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming (sgl-project#24081) 6bfb365 [PD] Rate limit prefill inflight polling warnings (sgl-project#24967) 6bb79c1 [Linear Attn] Add CUSTOM enum and plugin extensibility for kernel backends (sgl-project#24937) cfc41d5 Fix kimi k2.5 mla eagle + dp attention (sgl-project#25033) 0f3932c [Fix] Qwen3-ASR config: set thinker_config before super().__init__ (sgl-project#24187) f526e3f [Spec] Mamba scatter cleanup; fix multi-layer positional bug; dflash naming (sgl-project#25029) 10375a1 [NIXL][XPU] Fix uint64 overflow for mismatched P/D TP sizes (e.g. prefill_tp=1, decode_tp=2) (sgl-project#24648) 0a37d24 [diffusion] hardware: support sage attention backend on MUSA (attn backend, 21/N) (sgl-project#24752) 5495026 [HiCache] feat: default storage prefetch timeout (sgl-project#23309) 186eb42 Feat: Support SWA (Sliding Window Attention) for EAGLE-3 drafter (sgl-project#24664) a75b79e Feat: Support newer EAGLE-3 drafters (sgl-project#24663) f3a8189 [Spec] Internal rename per N2 v2 naming rule (sgl-project#25014) bfc2eda [MUSA] Use MUSA-optimized operators in piecewise CUDA graph (sgl-project#23633) 74d70af [Apple Silicon] Add Metal kernel support in sgl-kernel (sgl-project#23449)

…aming (sgl-project#24081)

hnyls2002 requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Qiaolin-Yu, Ying1123, ch-wan, fzyzcjy, hebiao064, ispobock, merrymercy, sufeng-buaa and xiezhq-hermann as code owners April 29, 2026 18:50

github-actions Bot added the blackwell SM100/SM120 label Apr 29, 2026

hnyls2002 requested review from hanming-lu, iforgetmyname, ping1jing2, yizhang2077 and yuan-luo as code owners April 29, 2026 19:12

github-actions Bot added the npu label Apr 29, 2026

hnyls2002 requested a review from kpham-sgl as a code owner May 8, 2026 22:04

hnyls2002 requested review from 1am9trash, hlu1, hubertlu-tw and kkHuang-amd as code owners May 9, 2026 02:12

hnyls2002 added run-ci high priority bypass-fastfail labels May 9, 2026

github-actions Bot added the documentation Improvements or additions to documentation label May 11, 2026

github-actions Bot added the speculative-decoding label May 11, 2026

This was referenced May 11, 2026

[Spec] Remove dead kernel params; fix stale comment in trtllm_mla #25010

Merged

[Spec] Internal rename per N2 v2 naming rule #25014

Merged

external API rename to correct_drafts; keep backward-compat aliases

dad5dff

hnyls2002 force-pushed the lsyin/correct-vs-accept-rename branch from 73436be to dad5dff Compare May 12, 2026 03:49

hnyls2002 added 2 commits May 11, 2026 20:59

rename local accepted/num_accepted to is_accepted/num_accept_tokens

9cac2ec

extract num_correct_drafts to local var in trace callers

36b09c5

sgl-project deleted a comment from github-actions Bot May 12, 2026

hnyls2002 merged commit 4ad63ad into main May 12, 2026
65 of 133 checks passed

hnyls2002 deleted the lsyin/correct-vs-accept-rename branch May 12, 2026 05:12

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

[Spec] Rename accepted_drafts -> correct_drafts for unambiguous n…

03fafd3

…aming (sgl-project#24081)

xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026

[Spec] Rename accepted_drafts -> correct_drafts for unambiguous n…

71fb1ff

…aming (sgl-project#24081)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming#24081

[Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming#24081
hnyls2002 merged 3 commits into
mainfrom
lsyin/correct-vs-accept-rename

hnyls2002 commented Apr 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 29, 2026

Uh oh!

hnyls2002 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Out of scope

TODO (follow-up)

Uh oh!

gemini-code-assist Bot commented Apr 29, 2026

Uh oh!

hnyls2002 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading