Skip to content

[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states#27688

Merged
benchislett merged 15 commits intovllm-project:mainfrom
CentML:eagle3-v2
Nov 25, 2025
Merged

[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states#27688
benchislett merged 15 commits intovllm-project:mainfrom
CentML:eagle3-v2

Conversation

@hjjq
Copy link
Contributor

@hjjq hjjq commented Oct 28, 2025

Some eagle3 heads (e.g., nvidia/gpt-oss-120b-Eagle3-v2) do not use auxiliary hidden states and directly uses the last layer output just like eagle1.
Currently, vLLM assumes all Eagle3 heads use aux hidden states. This PR removes this assumption and checks from the draft model config.

Different draft models may also have different shapes for their params, if they share the same torch.compile cache there will be errors when running them one after another. So this PR also adds the draft model hash to the SpeculativeConfig hash. No longer needed. See #27688 (comment)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for EAGLE3 heads that may not use auxiliary hidden states or may have their own lm_head. It also enhances the speculative configuration hash for eagle3 to differentiate between various draft models. The modifications in llama_eagle3.py, eagle.py, and gpu_model_runner.py seem to correctly implement the necessary logic for these new EAGLE3 variations. However, I've identified a critical issue in speculative.py where the hashing logic for non-eagle3 speculative methods has been altered, which could lead to CUDA graph cache collisions and subsequent runtime errors.

hjjq added 6 commits November 14, 2025 18:19
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
@hjjq hjjq changed the title [Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states and/or have their own lm_head [Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states Nov 16, 2025
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that when eagle3_use_aux_hidden_state is False, we don't ever use self.fc. Can you make sure that this weight never gets initialized if the flag is set, and also log a warning and continue gracefully is the "fc" weight is found in the checkpoint?

Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025
@benchislett benchislett enabled auto-merge (squash) November 20, 2025 20:26
benchislett and others added 3 commits November 20, 2025 19:56
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: hjjq <hanjieq@nvidia.com>
auto-merge was automatically disabled November 21, 2025 19:48

Head branch was pushed to by a user without write access

@hjjq
Copy link
Contributor Author

hjjq commented Nov 21, 2025

I have tested that the compilation hash change is no longer needed in top of tree, potentially due to changes in #26468. As a result, the ci failure should also be gone.

Signed-off-by: hjjq <hanjieq@nvidia.com>
@benchislett benchislett merged commit 5f9679a into vllm-project:main Nov 25, 2025
53 checks passed
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…_states (vllm-project#27688)

Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
…_states (vllm-project#27688)

Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
@hjjq hjjq deleted the eagle3-v2 branch March 3, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants