-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Added qwen3 vision language moe support for speculative decoding #32048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
benchislett
merged 35 commits into
vllm-project:main
from
neuralmagic:qwen3-vl-moe-spec-update
Jan 21, 2026
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
44f7715
Added qwen3 vision language moe support for speculative decoding
shanjiaz 9612a1a
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz bbef7e7
min diff
shanjiaz 86e804f
min diff
shanjiaz 35a1024
white space
shanjiaz 4ab9986
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 4f8160c
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 5ee93e0
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 0ba1e92
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz a65da8e
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 4bef2f9
Added test and refined conditions.
shanjiaz 3b035ba
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 3a71574
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz de8b289
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 5256ed9
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 75bd33c
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz b63cadc
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 3fb773f
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz b27e6c4
move logic to set_positions
shanjiaz 95b3617
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz f8cbcaf
format
shanjiaz 23798e3
min diff
shanjiaz edbaec8
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 654ddb7
remove test for now
shanjiaz d559e41
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 1666aa0
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz e121abe
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz ea62713
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz ab352e4
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 8f2b1e1
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 0a59c88
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 46521b5
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz aacde22
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 42db6eb
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz 44e3fbe
Merge branch 'main' into qwen3-vl-moe-spec-update
shanjiaz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -112,7 +112,9 @@ def __init__( | |
| self.input_ids = torch.zeros( | ||
| self.max_num_tokens, dtype=torch.int32, device=device | ||
| ) | ||
| self.uses_mrope = self.vllm_config.model_config.uses_mrope | ||
| # Use draft model's M-RoPE setting, not target model's | ||
| # Draft models may be text-only even if target is multimodal | ||
| self.uses_mrope = self.draft_model_config.uses_mrope | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be fine to use as should support both multi-modal and text only draft models, correct?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! |
||
| if self.uses_mrope: | ||
| # NOTE: `mrope_positions` is implemented with one additional dummy | ||
| # position on purpose to make it non-contiguous so that it can work | ||
|
|
@@ -221,6 +223,11 @@ def _set_positions(self, num_tokens: int, positions: torch.Tensor): | |
| if self.uses_mrope: | ||
| self.mrope_positions[:, :num_tokens] = positions | ||
| else: | ||
| # Convert M-RoPE positions if target model uses M-RoPE | ||
| # but draft doesn't, For text inputs, all M-RoPE | ||
| # dimensions are identical | ||
| if self.vllm_config.model_config.uses_mrope: | ||
| positions = positions[0] | ||
| self.positions[:num_tokens] = positions | ||
|
|
||
| def initialize_cudagraph_keys(self, cudagraph_mode: CUDAGraphMode) -> None: | ||
|
|
@@ -1080,6 +1087,7 @@ def load_model(self, target_model: nn.Module) -> None: | |
| if self.get_model_name(target_model) in [ | ||
| "Qwen2_5_VLForConditionalGeneration", | ||
| "Qwen3VLForConditionalGeneration", | ||
| "Qwen3VLMoeForConditionalGeneration", | ||
| ]: | ||
| self.model.config.image_token_index = target_model.config.image_token_id | ||
| elif self.get_model_name(target_model) == "PixtralForConditionalGeneration": | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.