Skip to content

[Spec] Rename accepted_drafts -> correct_drafts for unambiguous naming#24081

Merged
hnyls2002 merged 3 commits into
mainfrom
lsyin/correct-vs-accept-rename
May 12, 2026
Merged

[Spec] Rename accepted_drafts -> correct_drafts for unambiguous naming#24081
hnyls2002 merged 3 commits into
mainfrom
lsyin/correct-vs-accept-rename

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 29, 2026

Summary

  • Rename external-facing meta_info JSON keys and trace_slice fields from accepted_* to correct_drafts / num_correct_drafts, matching the post-rename internal convention.
  • Keep the old names as backward-compat aliases tagged # FIXME: backward-compat alias, remove in next release.

Background

Changes

meta_info JSON keys (tokenizer_manager.py)

  • Primary: spec_num_correct_drafts, spec_num_proposed_drafts, spec_correct_drafts_histogram
  • Aliases (FIXME): spec_accepted_drafts, spec_proposed_drafts, spec_accept_histogram

set_spec_verify_end_time(...) (req_time_stats.py)

  • Primary kwarg: num_correct_drafts=
  • Alias kwarg (FIXME): accepted_tokens= — copied into num_correct_drafts when provided
  • trace_slice writes both "num_correct_drafts" (new) and "accepted_tokens" (alias, FIXME) keys

Callers updated to new kwarg

  • eagle_worker.py, ngram_worker.py, frozen_kv_mtp_worker.py

Out of scope

  • meta_info["spec_accept_rate"] / meta_info["spec_accept_length"] and Prometheus sglang:spec_accept_{rate,length} — paper-aligned (Leviathan α / EAGLE τ), unchanged per Rule 3 exception.

TODO (follow-up)

  • Drop the four backward-compat aliases in the next release:
    • meta_info["spec_accepted_drafts"]
    • meta_info["spec_proposed_drafts"]
    • meta_info["spec_accept_histogram"]
    • accepted_tokens= kwarg + "accepted_tokens" trace_slice key
  • Rename accept_token_num kwarg in sgl-kernel C++ op schema (tree_speculative_sampling_target_only, verify_tree_greedy_func) — currently misleading per Rule 3 (kernel writes drafts-only). Requires a new sgl-kernel wheel.

Follows up on #25014, #25029, #25030.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002 hnyls2002 force-pushed the lsyin/correct-vs-accept-rename branch from 73436be to dad5dff Compare May 12, 2026 03:49
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test test_eagle_infer_a.py test_eagle_infer_b.py test_eagle_infer_beta.py test_eagle_constrained_decoding.py test_dflash.py test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

🚀 1-gpu-h100 (4 tests): ✅ View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py
cd test/ && python3 registered/spec/eagle/test_eagle_constrained_decoding.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

🚀 1-gpu-5090 (2 tests): ✅ View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py
cd test/ && python3 registered/spec/dflash/test_dflash.py

@sgl-project sgl-project deleted a comment from github-actions Bot May 12, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot May 12, 2026
@hnyls2002 hnyls2002 merged commit 4ad63ad into main May 12, 2026
65 of 133 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/correct-vs-accept-rename branch May 12, 2026 05:12
LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026
SpencerGarnets added a commit to ai-blaise/optimization-playground that referenced this pull request May 12, 2026
…ack)

Brings in upstream sgl-project/sglang main commits since
096ad02 (merge base, Laguna-XS.2 model support).
Total: 28 upstream commits composed.

Custom-stack files preserved intact (entirely-ours, byte-identical to
origin/main):
  - Blackwell CuTe kernel suite (warp_decode_cute, g1_attention_cute,
    gated_norm_cute, layersplit_cute, fused_store_index_cache)
  - TurboQuant 2.5-bit dense KV cache path
  - HIGGS 2-bit dense KV cache path (with split-K decode)
  - NVFP4 IndexCache dispatcher (active gate)
  - quantization_config_dispatch (HF-config-driven runtime routing)
  - All custom server-args flags and runtime methods preserved

Verification:
  - 200+ merged Python files compile cleanly
  - Dispatcher symbol presence verified
  - HIGGS pool / TurboQuant pool classes present at expected lines
  - compressed_tensors_w4a4_nvfp4_moe imports clean
  - All custom server-args flags present (enable_higgs_dense_2bit_kv_cache,
    enable_turboquant_dense_kv_cache, turboquant_dense_kv_preset,
    indexer_quantization_declared, higgs_mla_decode_num_splits, etc.)

Manual-merged shared files (auto-merge gave broken/mixed output; cleaned
up post-merge):
  - python/sglang/srt/disaggregation/mooncake/conn.py: upstream's PR#24932
    refactored maybe_send_extra into a state-types-loop. Replayed our
    LayerSplit NSA state-index-length-mismatch check inside the SWA/NSA
    branch of the new loop body.
  - sgl-kernel/python/sgl_kernel/__init__.py: upstream's PR#23449 (Apple
    Silicon Metal kernel) wrapped the entire module body in
    `if darwin/arm64: from sgl_kernel.metal import * else: ...`. The
    auto-merge duplicated the file body; rewrote cleanly with upstream's
    structure and re-injected our `g1_gate_forward`,
    `warp_decode_cute_moe_forward`, and
    `warp_decode_cute_moe_packed_forward` imports plus `g1_gate_forward`
    in _DEBUG_EXPORT_NAMES.
  - python/sglang/srt/managers/scheduler_output_processor_mixin.py: line
    628 still referenced `result.num_accepted_drafts` (renamed by PR
    sgl-project#25038 to `num_correct_drafts`). Renamed in place.
  - python/sglang/srt/observability/scheduler_metrics_mixin.py: a block
    around the spec-decode logging path had mixed old/new names from
    auto-merge (lines 553/557/560). Renamed `spec_num_accepted_tokens`
    -> `spec_num_accept_tokens` and local `num_accepted_drafts` ->
    `num_correct_drafts` to match the rest of the file.
  - test/test_smc_info.py: stub Req mock used the old field names
    `spec_accepted_drafts` and `update_spec_acceptance_histogram`.
    Renamed to `spec_num_correct_drafts` and
    `update_spec_correct_drafts_histogram` per PR sgl-project#24081.

Auto-merge cleanly integrated upstream changes to:
  - server_args.py (new fields: prefill_only_disable_kv_cache,
    weight_loader_drop_cache_after_load, prefill_delayer_queue_min_ratio,
    prefill_delayer_max_delay_ms, speculative_draft_window_size, etc.)
  - mem_cache/memory_pool.py (new NoOpMHATokenToKVPool)
  - model_executor/model_runner_kv_cache_mixin.py (NoOpMHATokenToKVPool
    pool factory + _validate_prefill_only_disable_kv_cache_pool_family)
  - layers/attention/nsa_backend.py (spec rename
    num_accepted_drafts -> num_correct_drafts;
    num_accepted_tokens -> num_accept_tokens)
  - layers/attention/nsa/nsa_indexer.py (new _apply_q_scale_and_softmax_scale
    compile method; torch.mm replaces deep_gemm wrapper)
  - 28+ disaggregation/spec/runner files with mostly clean
    upstream-side-only integration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

----- upstream commit subjects (28) -----
fd3eb77 [Cookbook]: add Laguna-XS.2 (Poolside) (sgl-project#24730)
6be1a45 Fix swa component host hit (sgl-project#25085)
693f497 [NPU] use causal_conv1d_update_v2 for performance (sgl-project#24595)
1efe9e2 [Bug Fix] Reject incompatible combination of --disable-cuda-graph-padding and --enable-torch-compile (sgl-project#23903)
8d27ce7 Optimize uvicorn startup command (sgl-project#25041)
b35fd5f [fix] skip legacy minicpmv conv template for MiniCPM-V 4.6 (sgl-project#24998)
7582237 [Tiny Fix] Disable BCG when inner layer_model unresolved (sgl-project#25021)
ca3bc05 Deepseek-v4-Pro share expert tp1 (sgl-project#24949)
a72d3ae [Spec] Multi-layer mamba scatter cleanup; fix positional call bug (sgl-project#25030)
7128533 Revert "Migrate Intel CPU cases to the test/registered." (sgl-project#25044)
1f985c5 [Spec] Rename `accepted_indices` -> `accept_indices`; drop `_token_id` suffix per Rule 5 (sgl-project#25038)
ecf5d84 Migrate Intel CPU cases to the test/registered. (sgl-project#22670)
d7f4761 [PD] Refactor hybrid state transfer (sgl-project#24932)
91907b7 [UnifiedTree]: Fix Unified HiCache tombstone lock release replay (sgl-project#24972)
4ad63ad [Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming (sgl-project#24081)
6bfb365 [PD] Rate limit prefill inflight polling warnings (sgl-project#24967)
6bb79c1 [Linear Attn] Add CUSTOM enum and plugin extensibility for kernel backends (sgl-project#24937)
cfc41d5 Fix kimi k2.5 mla eagle + dp attention (sgl-project#25033)
0f3932c [Fix] Qwen3-ASR config: set thinker_config before super().__init__ (sgl-project#24187)
f526e3f [Spec] Mamba scatter cleanup; fix multi-layer positional bug; dflash naming (sgl-project#25029)
10375a1 [NIXL][XPU] Fix uint64 overflow for mismatched P/D TP sizes (e.g. prefill_tp=1, decode_tp=2) (sgl-project#24648)
0a37d24 [diffusion] hardware: support sage attention backend on MUSA (attn backend, 21/N) (sgl-project#24752)
5495026 [HiCache] feat: default storage prefetch timeout (sgl-project#23309)
186eb42 Feat: Support SWA (Sliding Window Attention) for EAGLE-3 drafter (sgl-project#24664)
a75b79e Feat: Support newer EAGLE-3 drafters (sgl-project#24663)
f3a8189 [Spec] Internal rename per N2 v2 naming rule (sgl-project#25014)
bfc2eda [MUSA] Use MUSA-optimized operators in piecewise CUDA graph (sgl-project#23633)
74d70af [Apple Silicon] Add Metal kernel support in sgl-kernel (sgl-project#23449)
xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant