Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE#26485
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE#26485DarkLight1337 merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
0d94c76 to
05a6bb2
Compare
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com>
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com>
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
|
@rahul-tuli Hi, did the Speculative Decoding actually make Qwen3 MoE faster? What kind of acceptance rate did you see on average? |
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
Is nm-testing/Mockup-qwen235-eagle3-fp16 nm-testing/Mockup-qwen235-eagle3-fp16-speculators-converted? No nm-testing/Mockup-qwen235-eagle3-fp16 in https://huggingface.co/nm-testing currently. |
|
As the name suggests, |
…26485) Signed-off-by: Rahul Tuli <rtuli@redhat.com>
This PR adds support for EAGLE-3 speculative decoding to the
Qwen3MoeForCausalLMmodel, enabling faster inference with draft models likenm-testing/Mockup-qwen235-eagle3-fp16.Changes
Modified Files
vllm/model_executor/models/qwen3_moe.pyImplementation Details
Added
SupportsEagle3InterfaceSupportsEagle3toQwen3MoeForCausalLMclass inheritanceset_aux_hidden_state_layers()andget_eagle3_aux_hidden_state_layers()Updated
Qwen3MoeModelaux_hidden_state_layersattribute to track layers that output auxiliary hidden statesforward()method to collect auxiliary hidden states at specified layers(hidden_states, aux_hidden_states)when auxiliary states are collectedUpdated
Qwen3MoeForCausalLMget_eagle3_aux_hidden_state_layers()to return auxiliary layer indices (2, mid-layer, and n-3)set_aux_hidden_state_layers()to configure which layers output auxiliary statesTesting
Tested with Qwen3-235B-A22B MoE model and EAGLE-3 drafter:
Related
This implementation follows the same pattern as existing EAGLE-3 support in:
Qwen2ForCausalLMQwen3ForCausalLMLlamaForCausalLM