[Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` by hnyls2002 · Pull Request #24859 · sgl-project/sglang

hnyls2002 · 2026-05-09T21:58:18Z

Summary

Split the draft-extend phase out of EagleDraftInput into a new EagleDraftExtendInput dataclass, eliminating the phase-shifting overload where one instance was mutated across draft / draft-extend phases.
V1 path only (eagle_worker.py, multi_layer_eagle_worker.py, frozen_kv_mtp_worker.py). V2 overlap worker still reuses one instance across phases — alignment is a follow-up.

Background

Pre-PR EagleDraftInput.hidden_states switched between [bs, hidden] (draft) and [total_accepted, hidden] (draft-extend) on the same instance. Workers maintained the invariant locally; attention backends had to special-case by phase.
The next_draft_input returned by EagleVerifyInput.verify was misleadingly named — at construction time all 4 fields it carried (hidden_states[accept_index], num_accepted_drafts, num_accepted_tokens, num_accepted_tokens_cpu) were draft-extend data. It only "became" a draft input after forward_draft_extend_after_decode mutated topk_p / topk_index / hidden_states in place.
Four transient verify->extend handoff fields (unfinished_accept_tokens, seq_lens_for_draft_extend, seq_lens_for_draft_extend_cpu, req_pool_indices_for_draft_extend) lived on EagleVerifyOutput purely to thread state from verify to prepare_extend_after_decode.

Schema changes (`eagle_info.py`)

New EagleDraftExtendInput

Owns full extend-phase state: per-accept-token hidden_states, per-req accept counts, the 4 ex-handoff fields (input_ids, seq_lens, seq_lens_cpu, req_pool_indices), and kernel outputs (positions, bonus_tokens).
prepare_extend_after_decode drops its verify_output arg and reads everything from self; adds assert batch.spec_info is self invariant.

Trimmed EagleDraftInput

Keeps only true draft-phase fields (topk_p, topk_index, hidden_states[bs, h], bonus_tokens, kv_indptr, kv_indices).
Five V2-only fields (future_indices, new_seq_lens, verify_done, num_accepted_drafts, num_accepted_tokens) kept as Optional carve-outs and commented as "V2 overlap worker only" — to be cleaned up after V2 alignment.

EagleVerifyOutput

draft_extend_input: EagleDraftExtendInput replaces next_draft_input and absorbs the 4 transient handoff fields.

Worker control flow

verify(self, batch) no longer takes spec_info — reads from batch.spec_info after caller installs (mirrors V1 / multi-layer / Frozen).
forward_draft_extend_after_decode is now a pure transform: caller installs EagleDraftExtendInput as batch.spec_info, method returns a freshly-built EagleDraftInput for next iter, caller installs that.
All-reqs-finished branch installs an empty EagleDraftInput(capture_hidden_mode=LAST) so next iter's merge_batch short-circuits on hidden_states is None (EagleVerifyInput has no merge_batch).
Non-cuda-graph extend path replaces self.capture_for_decode(logits_output, forward_batch.spec_info) with inline softmax + fast_topk — equivalent semantics, no longer mutates spec_info.
Backup/restore in forward_draft_extend_after_decode no longer touches num_accepted_drafts / num_accepted_tokens (now on the soon-to-be-discarded EagleDraftExtendInput).

Type registration & padding

Add SpecInputType.EAGLE_DRAFT_EXTEND and SpecInputType.FROZEN_KV_MTP_DRAFT_EXTEND; both in is_draft_input() so _pad_inputs_to_size covers the new phase.
forward_batch_info._pad_inputs_to_size switches to getattr(spec_info, ..., None) since the two draft-input types now carry disjoint subsets of topk_p / topk_index / num_accepted_drafts.

Frozen-KV MTP mirror

New FrozenKVMTPDraftExtendInput(EagleDraftExtendInput) tag-only subclass; _to_frozen_kv_mtp_draft_input renamed to _to_frozen_kv_mtp_draft_extend_input and reflects over EagleDraftExtendInput.fields.
select_last_verified_seed drops the num_accepted_tokens is None early-return (always present on the new dataclass).
frozen_kv_mtp_worker.forward_draft_extend_after_decode adds idle early-return after stashing an idle FrozenKVMTPDraftInput.

Looks confusing but is correct

filter_batch / merge_batch appear rewritten in the diff but are byte-identical to pre-PR. They moved up inside EagleDraftInput only because prepare_extend_after_decode / generate_attn_arg_prefill got extracted to EagleDraftExtendInput — verified via sha1 on the function range.
EagleDraftInput still has num_accepted_drafts / num_accepted_tokens (and 3 other V2 fields) after a "schema split" PR. Looks like a leftover, but it is intentional: V2 overlap worker still reuses one instance across phases. Comment explicitly tags them "V2 overlap worker only"; cleaned up after V2 alignment.
bonus_tokens exists on both EagleDraftInput and EagleDraftExtendInput. Not a duplicate. The kernel writes it on the extend-input; the worker copies it onto the next-iter draft-input where the next draft forward consumes it. Two roles, two homes.
All-reqs-finished branch installs an empty EagleDraftInput(capture_hidden_mode=LAST) instead of leaving batch.spec_info as the now-stale EagleVerifyInput. Looks like a no-op assignment, but it is required so the scheduler's next-iter merge_batch finds an EagleDraftInput (which has merge_batch) instead of an EagleVerifyInput (which doesn't), and short-circuits on hidden_states is None.
Non-cuda-graph extend path drops self.capture_for_decode(...) and inlines softmax + fast_topk. Looks like a behavior change — it's not. capture_for_decode body is exactly those two lines plus an in-place assignment to spec_info; inlining is equivalent and avoids mutating the soon-to-be-discarded EagleDraftExtendInput.
forward_draft_extend_after_decode returns EagleDraftInput in EAGLE / multi-layer-EAGLE workers but returns None in frozen_kv_mtp_worker. Asymmetric on purpose: Frozen's _run_assistant_seed_step already installs a fresh FrozenKVMTPDraftInput onto batch.spec_info internally, so there is nothing for the caller to reinstall.
EagleDraftExtendInput.is_draft_input() returns True despite the name "draft-extend". Reused on purpose — _pad_inputs_to_size keys off is_draft_input() to decide whether to pad spec-info tensors, and the new extend phase needs the same padding treatment.
prepare_extend_after_decode adds assert batch.spec_info is self. Looks like a defensive paranoia check; it's actually a phase-boundary invariant — the method now reads input_ids / seq_lens / req_pool_indices off self rather than via a verify_output arg, so the caller must have installed self as batch.spec_info first.

Test plan

CI runs all V1 EAGLE / Multi-layer EAGLE / Frozen KV MTP suites
Retraction tests under EAGLE3 (TestStreamingSessionEagleRetractLargePage) — covered by stage-b
DP-attention forced-extend path (kept under enable_dp_attention or input_ids.shape[0] > 0)

…tend_after_decode

…tor installs

…ct_last_verified_seed

gemini-code-assist · 2026-05-09T21:58:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…spec_info

…pec_info

…ase)

* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py

hnyls2002 added 12 commits May 9, 2026 14:54

introduce EagleDraftExtendInput; split phase from EagleDraftInput

66e8b7a

rename extend_input -> draft_extend_input

057b8f7

add isinstance asserts at draft_extend_input phase boundaries

f5b85fd

move draft_extend_input install out of verify() into forward_draft_ex…

0634680

…tend_after_decode

move spec_info phase install to executor (forward_batch_generation)

1fc71dd

V1: forward_draft_extend_after_decode returns next_draft_input; execu…

88fffbe

…tor installs

drop unused model_worker_batch from verify() return

958365e

drop redundant spec_info.positions = None

bfabaa7

drop stale draft_extend shape note on EagleDraftInput.hidden_states

74c9ca9

move generate_attn_arg_prefill to EagleDraftExtendInput; tighten sele…

5f80483

…ct_last_verified_seed

drop redundant spec_info args; read spec_info from batch

6ad0e49

move verify->extend handoff fields onto EagleDraftExtendInput

1ff8f8f

hnyls2002 requested review from Qiaolin-Yu, Ying1123 and merrymercy as code owners May 9, 2026 21:58

hnyls2002 mentioned this pull request May 9, 2026

[Spec V2] Migrate V2 path to EagleDraftExtendInput; verify returns EagleVerifyOutput #24860

Open

2 tasks

hnyls2002 added run-ci high priority bypass-fastfail labels May 9, 2026

hnyls2002 added 3 commits May 9, 2026 15:09

forward_batch_info: getattr-guard num_accepted_drafts on draft-phase …

7bfe3c0

…spec_info

forward_batch_info: getattr-guard topk_p/topk_index on draft-extend s…

4c680f1

…pec_info

v1: install empty EagleDraftInput when extend skipped (retract edge c…

490bcc0

…ase)

hnyls2002 requested review from Fridge003 and ispobock as code owners May 9, 2026 22:11

hnyls2002 mentioned this pull request May 9, 2026

[Spec] Edge case fixes: retract idle ExtendInput; getattr-guard phase-specific spec_info fields #24864

Closed

2 tasks

hnyls2002 added 4 commits May 9, 2026 15:17

restore num_accepted_drafts/tokens on EagleDraftInput for V2

a6c4467

stash spec_info comments; drop unused batch param

3d123cd

cleanup forward_batch_info comments; drop dead server_args param

c26ca56

drop redundant spec_info.positions = None

be72838

hnyls2002 added 4 commits May 9, 2026 15:34

drop unused model_worker_batch from verify() return

dd35129

drop dead server_args param from enable_num_token_non_padded

32b40e2

drop dead batch param from check_forward_draft_extend_after_decode

56d55d9

merge lsyin/spec-drop-dead-code

2caa926

hnyls2002 changed the base branch from main to lsyin/spec-drop-dead-code May 9, 2026 22:43

Base automatically changed from lsyin/spec-drop-dead-code to main May 9, 2026 22:53

merge origin/main

7c8825f

hnyls2002 changed the title ~~[Spec V1] Introduce EagleDraftExtendInput; split draft-extend phase from EagleDraftInput~~ [Spec V1] Split draft-extend phase from EagleDraftInput into new EagleDraftExtendInput May 10, 2026

hnyls2002 merged commit d087442 into main May 10, 2026
124 of 134 checks passed

hnyls2002 deleted the lsyin/spec-pr1 branch May 10, 2026 08:07

hnyls2002 mentioned this pull request May 10, 2026

[Spec] Cleanup idle stub and shape-check patterns #24881

Merged

dssugar mentioned this pull request May 11, 2026

[Bug] TypeError: 'NoneType' object is not subscriptable in frozen_kv_mtp_utils.py when using Gemma 4 Assistant draft model #24912

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput`#24859

[Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput`#24859
hnyls2002 merged 24 commits into
mainfrom
lsyin/spec-pr1

hnyls2002 commented May 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Schema changes (eagle_info.py)

Worker control flow

Type registration & padding

Frozen-KV MTP mirror

Looks confusing but is correct

Test plan

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented May 9, 2026 •

edited

Loading

Schema changes (`eagle_info.py`)