[Model][QwenVL] Simplify cos/sin rotary embedding indexing #28962
Merged
Isotr0py merged 3 commits intovllm-project:mainfrom Nov 19, 2025
Merged
[Model][QwenVL] Simplify cos/sin rotary embedding indexing #28962Isotr0py merged 3 commits intovllm-project:mainfrom
Isotr0py merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request simplifies the rotary embedding indexing logic across several models, which improves code readability and maintainability. It also addresses a performance issue in qwen3_vl.py by moving pos_ids to the GPU asynchronously, preventing synchronous CPU-to-GPU copies.
However, the same performance optimization is missing in other models modified in this PR (glm4_1v.py, qwen2_5_vl.py, qwen2_vl.py, and qwen3_omni_moe_thinker.py), where pos_ids is still created on the CPU, leading to synchronous copies during indexing. I've added specific comments to apply the same fix to these models for consistency and performance improvement.
Isotr0py
approved these changes
Nov 19, 2025
gcanlin
approved these changes
Nov 19, 2025
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
…ect#28962) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
kitaekatt
pushed a commit
to kitaekatt/vllm
that referenced
this pull request
Dec 1, 2025
…ect#28962) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This is a small followup from #28798 which simplifies the indexing logic. /cc @gcanlin @Isotr0py
For Qwen3VL #28798 slightly changed behaviour where
get_cos_sin()now already returns a GPU tensor. This introduces synchronous CPU to GPU copies ofpos_idswhen using it to index. cc1f0c5 Fixes it by moving the indices onto the GPU in a non-blocking way.Before:


After:
Test Plan
Test Result
Before:
After: