[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states by hjjq · Pull Request #27688 · vllm-project/vllm

hjjq · 2025-10-28T21:01:37Z

Some eagle3 heads (e.g., nvidia/gpt-oss-120b-Eagle3-v2) do not use auxiliary hidden states and directly uses the last layer output just like eagle1.
Currently, vLLM assumes all Eagle3 heads use aux hidden states. This PR removes this assumption and checks from the draft model config.

Different draft models may also have different shapes for their params, if they share the same torch.compile cache there will be errors when running them one after another. So this PR also adds the draft model hash to the SpeculativeConfig hash. No longer needed. See #27688 (comment)

gemini-code-assist

Code Review

This pull request adds support for EAGLE3 heads that may not use auxiliary hidden states or may have their own lm_head. It also enhances the speculative configuration hash for eagle3 to differentiate between various draft models. The modifications in llama_eagle3.py, eagle.py, and gpu_model_runner.py seem to correctly implement the necessary logic for these new EAGLE3 variations. However, I've identified a critical issue in speculative.py where the hashing logic for non-eagle3 speculative methods has been altered, which could lead to CUDA graph cache collisions and subsequent runtime errors.

vllm/config/speculative.py

Signed-off-by: hjjq <hanjieq@nvidia.com>

benchislett

I assume that when eagle3_use_aux_hidden_state is False, we don't ever use self.fc. Can you make sure that this weight never gets initialized if the flag is set, and also log a warning and continue gracefully is the "fc" weight is found in the checkpoint?

Signed-off-by: hjjq <hanjieq@nvidia.com>

benchislett

LGTM

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

Signed-off-by: hjjq <hanjieq@nvidia.com>

hjjq · 2025-11-21T19:58:43Z

I have tested that the compilation hash change is no longer needed in top of tree, potentially due to changes in #26468. As a result, the ci failure should also be gone.

Signed-off-by: hjjq <hanjieq@nvidia.com>

…_states (vllm-project#27688) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>

hjjq requested review from ProExpertProg, WoosukKwon, benchislett, hmellor, houseroad, luccafong, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 28, 2025 21:01

mergify bot added llama Related to Llama models speculative-decoding v1 labels Oct 28, 2025

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

vllm/config/speculative.py Outdated Show resolved Hide resolved

eldarkurtic mentioned this pull request Oct 29, 2025

Prevent overwriting drafters lm-head and embed_tokens #27737

Closed

eldarkurtic mentioned this pull request Nov 12, 2025

Add support for Eagle with separate lm-head and embed_tokens layers #28549

Merged

hjjq added 6 commits November 14, 2025 18:19

Get use_aux_hidden_state from config

87ca9b8

Signed-off-by: hjjq <hanjieq@nvidia.com>

format

f329c52

Signed-off-by: hjjq <hanjieq@nvidia.com>

Remove lm_head sharing

6d65f34

Signed-off-by: hjjq <hanjieq@nvidia.com>

check for lm_head

be9f2c4

Signed-off-by: hjjq <hanjieq@nvidia.com>

update hash

def8af8

Signed-off-by: hjjq <hanjieq@nvidia.com>

lint

88135e3

Signed-off-by: hjjq <hanjieq@nvidia.com>

hjjq force-pushed the eagle3-v2 branch from c7805a0 to 88135e3 Compare November 14, 2025 19:03

Merge branch 'main' into eagle3-v2

eb2f140

Signed-off-by: hjjq <hanjieq@nvidia.com>

hjjq changed the title ~~[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states and/or have their own lm_head~~ [Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states Nov 16, 2025

benchislett requested changes Nov 18, 2025

View reviewed changes

xinli-sw mentioned this pull request Nov 18, 2025

[Tracking Issue][Performance]: Speculative decoding performance/QoL improvements #28947

Open

24 tasks

Merge branch 'main' into eagle3-v2

88a6f6b

Signed-off-by: hjjq <hanjieq@nvidia.com>

don't load fc

7f3bc2b

Signed-off-by: hjjq <hanjieq@nvidia.com>

benchislett approved these changes Nov 20, 2025

View reviewed changes

benchislett added 2 commits November 20, 2025 01:12

fix pre-commit

b3505e9

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

Merge branch 'main' into eagle3-v2

8505658

benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025

benchislett enabled auto-merge (squash) November 20, 2025 20:26

benchislett and others added 3 commits November 20, 2025 19:56

Merge branch 'main' into eagle3-v2

92e598c

Merge branch 'main' into eagle3-v2

56c3fa5

Signed-off-by: hjjq <hanjieq@nvidia.com>

revert compute hash

34a9373

Signed-off-by: hjjq <hanjieq@nvidia.com>

auto-merge was automatically disabled November 21, 2025 19:48
Head branch was pushed to by a user without write access

Merge branch 'main' into eagle3-v2

d4b3267

Signed-off-by: hjjq <hanjieq@nvidia.com>

benchislett merged commit 5f9679a into vllm-project:main Nov 25, 2025
53 checks passed

nvpohanh mentioned this pull request Dec 16, 2025

[Tracking Issue][Performance] GPT-OSS B200/GB200 performance optimization tracker #30758

Open

12 tasks

hjjq deleted the eagle3-v2 branch March 3, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states#27688

[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states#27688
benchislett merged 15 commits intovllm-project:mainfrom
CentML:eagle3-v2

hjjq commented Oct 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

benchislett left a comment

Uh oh!

hjjq commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hjjq commented Oct 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

hjjq commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjjq commented Oct 28, 2025 •

edited by github-actions bot

Loading