[Refactor] Consolidate SupportsEagle by benchislett · Pull Request #36063 · vllm-project/vllm

benchislett · 2026-03-04T23:54:27Z

Purpose

Consolidate set_aux_hidden_state_layers and get_eagle3_aux_hidden_state_layers which are the same pretty much everywhere
Capture aux_hidden_states for the output of the last layer also, for completeness. Adds EagleModelMixin to facilitate some of the logic.
Add SupportsEagle tag to models which support Eagle3, for consistency.

The goal of this PR is to keep existing behaviours unchanged as much as possible. As such, some implementations' semantics are unchanged even if they might be considered buggy.

Specifically, #36151 outlines an issue with PP support due to incorrect iteration order over the layers. I consider this to be out-of-scope for this PR, as I do not want to change semantics as part of this refactor if I can help it. In the event that the PP fix has consequences, it would be easier to rollback as a standalone follow-up than as part of a broader refactor. I am open to discussion on this matter if we feel it would be easier to lump it all together.
This means that some implementations will use aux_hidden_states = self._maybe_add_hidden_state([], 0, hidden_states, residual) and others will have self.start_layer for the first layer. This can be very easily changed in a follow-up PR.

Testing

Spec Decoding E2E tests all passing locally. Also manually checked that the (marked as skipped) Qwen3-VL EAGLE3 test works properly

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

gemini-code-assist

Code Review

This pull request is a great refactoring that consolidates the logic for Eagle speculative decoding support by introducing an EagleModelMixin and updating the SupportsEagle3 interface. This significantly reduces code duplication across multiple models. However, I've found a critical issue in the implementation of the forward pass for several models. The layer index passed to _maybe_add_hidden_state is incorrect when pipeline parallelism is used, as it uses a relative index instead of an absolute one. This will cause speculative decoding to fail in pipeline parallel setups. I've provided suggestions to fix this in the affected files.

_{Note: Security Review did not run due to the size of the PR.}

mergify · 2026-03-05T00:00:11Z

Hi @benchislett, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

fynnsu

This looks good, a few small comments/questions below.

fynnsu · 2026-03-05T13:36:07Z

            residual = intermediate_tensors["residual"]

-        aux_hidden_states = []
+        aux_hidden_states = self._maybe_add_hidden_state([], 0, hidden_states, residual)


I think this needs to use start layer to handle pp case

Suggested change

aux_hidden_states = self._maybe_add_hidden_state([], 0, hidden_states, residual)

aux_hidden_states = self._maybe_add_hidden_state([], self.start_layer, hidden_states, residual)

Addressed in description, see also #36151. Let me know if you think it would be better to apply the fix to all the models in this PR.

Sure, it seems like fixing this likely won't be enough to get PP + spec decode working, since it seems like we aren't transferring aux_hidden_states across PP ranks in gpu model runner. So this will probably require a larger fix.

Renamed the bug accordingly.

fynnsu · 2026-03-05T13:59:17Z

        assert hasattr(self.language_model, "set_aux_hidden_state_layers")
        self.language_model.set_aux_hidden_state_layers(layers)


Is any of this required for this model?

Won't the SupportEagle3.set_aux_hidden_state_layers implementation already find the .language_model and call _set_aux_hidden_state_layers on its .model attr?

Same for get_eagle3_default_aux_hidden_state_layers

This one is special because it calls .set_aux_hidden_state_layers on the language_model instead of reaching into .language_model.model directly. Because of this, it's technically possible for the language_model parent to override .set_aux_hidden_state_layers and change the behaviour.

Since a primary goal of this PR is not to change behaviour, I chose to leave this one alone.

mergify · 2026-03-05T14:19:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @benchislett.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

fynnsu

LGTM

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](vllm-project/vllm#35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](vllm-project/vllm#36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](vllm-project/vllm#37031) 4.fix [Support multiple KV groups in OffloadingSpec ](vllm-project/vllm#36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](vllm-project/vllm#36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](vllm-project/vllm#35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](vllm-project/vllm#36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](vllm-project/vllm#37031) 4.fix [Support multiple KV groups in OffloadingSpec ](vllm-project/vllm#36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](vllm-project/vllm#36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett added 2 commits March 4, 2026 21:05

refactor EAGLE classes

ce83fe1

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

refactor EAGLE model integration code

fb6b3b2

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett requested review from ApostaC, WoosukKwon, hmellor, njhill, orozery, patrickvonplaten and sighingnow as code owners March 4, 2026 23:54

mergify Bot added llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models v1 labels Mar 4, 2026

github-project-automation Bot added this to gpt-oss Issues & Enhancements Mar 4, 2026

mergify Bot added the kv-connector label Mar 4, 2026

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 4, 2026

gemini-code-assist Bot reviewed Mar 4, 2026

View reviewed changes

fynnsu suggested changes Mar 5, 2026

View reviewed changes

github-project-automation Bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Mar 5, 2026

mergify Bot added the needs-rebase label Mar 5, 2026

benchislett mentioned this pull request Mar 5, 2026

[Bug]: Inconsistent PP layer indexing in EAGLE model code #36151

Open

1 task

Merge branch 'main' into bchislett/eagle-abstraction

5147820

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

mergify Bot removed the needs-rebase label Mar 5, 2026

remove another method that is covered by default

6e9f907

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett mentioned this pull request Mar 5, 2026

[feat] Kimi K2/DeepSeek Support eagle3 #35966

Closed

5 tasks

fynnsu approved these changes Mar 5, 2026

View reviewed changes

benchislett mentioned this pull request Mar 10, 2026

Add: Eagle3 support for Qwen3.5 #36658

Merged

fynnsu mentioned this pull request Mar 11, 2026

[RFC]: Hidden States Extraction #33118

Closed

1 task

hmellor reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm/model_executor/models/interfaces.py

hmellor reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm/model_executor/models/mimo_v2_flash.py

hmellor approved these changes Mar 13, 2026

View reviewed changes

github-project-automation Bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Mar 13, 2026

hmellor enabled auto-merge (squash) March 13, 2026 18:42

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2026

Merge branch 'main' into bchislett/eagle-abstraction

711ed11

hmellor merged commit 8b34630 into vllm-project:main Mar 13, 2026
66 checks passed

github-project-automation Bot moved this from Ready to Done in gpt-oss Issues & Enhancements Mar 13, 2026

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Refactor] Consolidate SupportsEagle (vllm-project#36063)

9b3326f

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

leo-pony mentioned this pull request Mar 18, 2026

Main2main upgrade to vllm 0317 afternoon vllm-project/vllm-ascend#7409

Merged

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Refactor] Consolidate SupportsEagle (vllm-project#36063)

e359a48

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Refactor] Consolidate SupportsEagle (vllm-project#36063)

c68bee6

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Refactor] Consolidate SupportsEagle (vllm-project#36063)

f9b514d

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

	aux_hidden_states = self._maybe_add_hidden_state([], 0, hidden_states, residual)
	aux_hidden_states = self._maybe_add_hidden_state([], self.start_layer, hidden_states, residual)

		assert hasattr(self.language_model, "set_aux_hidden_state_layers")
		self.language_model.set_aux_hidden_state_layers(layers)

Uh oh!

Conversation

benchislett commented Mar 4, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Mar 5, 2026

Uh oh!

fynnsu left a comment

Choose a reason for hiding this comment

Uh oh!

fynnsu Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

fynnsu Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fynnsu Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify Bot commented Mar 5, 2026

Uh oh!

fynnsu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benchislett commented Mar 4, 2026 •

edited by github-actions Bot

Loading