Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE by rahul-tuli · Pull Request #26485 · vllm-project/vllm

rahul-tuli · 2025-10-09T12:34:36Z

This PR adds support for EAGLE-3 speculative decoding to the Qwen3MoeForCausalLM model, enabling faster inference with draft models like nm-testing/Mockup-qwen235-eagle3-fp16.

Changes

Modified Files

vllm/model_executor/models/qwen3_moe.py

Implementation Details

Added SupportsEagle3 Interface
- Imported and added SupportsEagle3 to Qwen3MoeForCausalLM class inheritance
- Implements required methods: set_aux_hidden_state_layers() and get_eagle3_aux_hidden_state_layers()
Updated Qwen3MoeModel
- Added aux_hidden_state_layers attribute to track layers that output auxiliary hidden states
- Modified forward() method to collect auxiliary hidden states at specified layers
- Returns tuple of (hidden_states, aux_hidden_states) when auxiliary states are collected
Updated Qwen3MoeForCausalLM
- Implements get_eagle3_aux_hidden_state_layers() to return auxiliary layer indices (2, mid-layer, and n-3)
- Implements set_aux_hidden_state_layers() to configure which layers output auxiliary states

Testing

Tested with Qwen3-235B-A22B MoE model and EAGLE-3 drafter:

from vllm import LLM, SamplingParams

# Initialize with EAGLE-3 speculative decoding
llm = LLM(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
    tensor_parallel_size=4,
    speculative_config={
        "model": "nm-testing/Mockup-qwen235-eagle3-fp16",
        "method": "eagle3",
        "num_speculative_tokens": 3,
    },
    max_model_len=16384,
)

# Generate with speculative decoding
prompts = [
    "Hello, my name is",
    "The capital of France is",
]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=100)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(f"Prompt: {output.prompt}")
    print(f"Generated: {output.outputs[0].text}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE#26485

Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE#26485
DarkLight1337 merged 4 commits intovllm-project:mainfrom
neuralmagic:feature/support-qwen3-moe-eagle3

rahul-tuli commented Oct 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

zeroorhero commented Oct 28, 2025

Uh oh!

gyou2021 commented Nov 27, 2025

Uh oh!

eldarkurtic commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

rahul-tuli commented Oct 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Modified Files

Implementation Details

Testing

Related

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zeroorhero commented Oct 28, 2025

Uh oh!

gyou2021 commented Nov 27, 2025

Uh oh!

eldarkurtic commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rahul-tuli commented Oct 9, 2025 •

edited by github-actions bot

Loading