Skip to content

fix: Fix decode worker in vllm for qwen_vl models#5281

Merged
indrajit96 merged 7 commits intomainfrom
kprashanth/qwen-epd-vllm
Jan 9, 2026
Merged

fix: Fix decode worker in vllm for qwen_vl models#5281
indrajit96 merged 7 commits intomainfrom
kprashanth/qwen-epd-vllm

Conversation

@indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Jan 8, 2026

Overview:

Fix disaggregated multimodal decode for Qwen2.5-VL models by passing the original unexpanded prompt to the decode worker, allowing vLLM to expand it identically to prefill and fix crash.

Details:

Problem: Qwen2.5-VL disaggregated decode was failing with:
IndexError: list index out of range when no multimodal data was passed (mRoPE needs image_grid_thw)

Solution:

  1. Decode worker: Pass multi_modal_data with zero embeddings and image_grid_thw for mRoPE position calculation
  2. PD worker: For Qwen models, keep the original unexpanded prompt (with placeholders) instead of using the expanded prompt from prefill

Where should the reviewer start?

  • components/src/dynamo/vllm/multimodal_handlers/worker_handler.py - Main logic changes in MultimodalDecodeWorkerHandler and MultimodalPDWorkerHandler
  • components/src/dynamo/vllm/multimodal_utils/model.py - New construct_qwen_decode_mm_data() function

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for Qwen Vision-Language models with enhanced multimodal embeddings processing.
    • Improved handling of multimodal data for vision-language model inference.

✏️ Tip: You can customize this high-level summary in your review settings.

KrishnanPrash and others added 4 commits January 3, 2026 22:48
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
@indrajit96 indrajit96 changed the title Fix decode worker in vllm for qwen_vl models fix: Fix decode worker in vllm for qwen_vl models Jan 8, 2026
@github-actions github-actions bot added the fix label Jan 8, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

Walkthrough

The changes implement Qwen VL multimodal decoding support by introducing a new utility function to construct decode-time multimodal data structures and integrating it into the worker handler with model-specific prompt token handling logic.

Changes

Cohort / File(s) Summary
Qwen VL Multimodal Decode Support
components/src/dynamo/vllm/multimodal_handlers/worker_handler.py, components/src/dynamo/vllm/multimodal_utils/model.py
Adds construct_qwen_decode_mm_data() function to build zero-initialized multimodal data tensors; integrates into worker handler with conditional prompt token handling for mRoPE-based Qwen models versus non-Qwen models
Deployment Configuration
examples/backends/vllm/launch/disagg_multimodal_epd.sh
Enables multimodal embeddings in decode worker by passing --enable-mm-embeds flag

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Qwen VL hops into decode,
Zero tensors light the load,
mRoPE prompts dance untamed and free,
Embeddings flow, let visions be,
Multimodal magic—enabled to go!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: Fix decode worker in vllm for qwen_vl models' directly and specifically describes the main change: fixing the decode worker for Qwen VL models in vLLM, which aligns with the changeset's core objective.
Description check ✅ Passed The PR description provides a clear overview, detailed explanation of the problem and solution, and identifies specific files for review. All key sections from the template are present and substantially filled out.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@indrajit96 indrajit96 requested a review from furionw January 8, 2026 18:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @components/src/dynamo/vllm/multimodal_utils/model.py:
- Around line 182-187: The signature for construct_qwen_decode_mm_data uses
embeddings_shape: Optional[Any>, which is too loose; tighten it to a more
specific sequence type such as Optional[Tuple[int, ...]] or
Optional[Sequence[int]] (and import Tuple/Sequence from typing) so callers must
pass a valid shape (e.g., (N, C, H, W)). Update the function annotation to
embeddings_shape: Optional[Tuple[int, ...]] (or Optional[Sequence[int]]) and
adjust any internal code or tests if they relied on Any, ensuring any runtime
checks that treat embeddings_shape as an indexable shape remain valid.
🧹 Nitpick comments (1)
components/src/dynamo/vllm/multimodal_utils/model.py (1)

199-206: Add validation for embeddings_shape and ndim before squeezing.

The function doesn't validate that embeddings_shape is a valid shape (e.g., a sequence of integers), which could lead to cryptic PyTorch errors. Additionally, the squeeze operation assumes a 3D tensor but only checks ndim == 3 after creating the tensor—consider validating the shape length beforehand.

♻️ Proposed validation improvements
     if image_grid_thw is None or len(image_grid_thw) == 0:
         raise ValueError("No image grid provided for Qwen model.")
     if embeddings_shape is None:
         raise ValueError("embeddings_shape is required for Qwen decode mm data.")
+    if not isinstance(embeddings_shape, (tuple, list)) or not all(isinstance(x, int) and x > 0 for x in embeddings_shape):
+        raise ValueError(f"embeddings_shape must be a tuple or list of positive integers, got {embeddings_shape}")
 
     image_embeds = torch.zeros(embeddings_shape, dtype=dtype, device="cpu")
-    if image_embeds.ndim == 3:
+    if len(embeddings_shape) == 3:
         image_embeds = image_embeds.squeeze(0)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfb95df and a45a8ab.

📒 Files selected for processing (3)
  • components/src/dynamo/vllm/multimodal_handlers/worker_handler.py
  • components/src/dynamo/vllm/multimodal_utils/model.py
  • examples/backends/vllm/launch/disagg_multimodal_epd.sh
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-04T06:45:28.414Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 5153
File: examples/backends/vllm/launch/lora/setup_minio.sh:99-109
Timestamp: 2026-01-04T06:45:28.414Z
Learning: For HuggingFace CLI version reporting (hf version and huggingface-cli version) in v0.34.6 and later, use direct argument syntax instead of the --version flag. Review shell-script changes and any scripts invoking the HuggingFace CLI to ensure they call the version output with a direct argument (e.g., 'hf version' or equivalent) rather than using '--version'. Apply to shell scripts and any related CLI invocations in the repository.

Applied to files:

  • examples/backends/vllm/launch/disagg_multimodal_epd.sh
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

  • components/src/dynamo/vllm/multimodal_handlers/worker_handler.py
🪛 Ruff (0.14.10)
components/src/dynamo/vllm/multimodal_utils/model.py

200-200: Avoid specifying long messages outside the exception class

(TRY003)


202-202: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

96-96: LGTM! Decode worker now consistent with prefill configuration.

Adding --enable-mm-embeds to the decode worker aligns with the prefill worker configuration (line 91) and supports the multimodal decode functionality introduced in this PR.

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py (2)

67-85: Well-documented Qwen-specific multimodal handling.

The approach of passing zero embeddings with image_grid_thw for mRoPE position calculation is clearly explained. The conditional logic ensures Qwen VL models receive the necessary multimodal data while maintaining backward compatibility. The multi_modal_data parameter is correctly typed as NotRequired[Optional[Any]] (per the type override in multimodal_utils/protocol.py), so passing None for non-Qwen models is safe.


273-283: Conditional prompt handling logic for disaggregated multimodal decode is correctly implemented.

The code properly distinguishes between Qwen VL models and others:

  • Qwen VL: Preserves original prompt for decode worker to expand with multi_modal_data, matching prefill expansion and ensuring block count alignment
  • Non-Qwen: Uses expanded prompt from prefill response where vLLM won't expand further, matching the KV cache layout

The kv_transfer_params are correctly propagated to maintain block synchronization between prefill and decode workers.

@rmccorm4 rmccorm4 added backend::vllm Relates to the vllm backend multimodal labels Jan 8, 2026
@indrajit96 indrajit96 merged commit 5cd8005 into main Jan 9, 2026
27 of 28 checks passed
@indrajit96 indrajit96 deleted the kprashanth/qwen-epd-vllm branch January 9, 2026 20:51
indrajit96 added a commit that referenced this pull request Jan 9, 2026
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Co-authored-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
nv-anants added a commit that referenced this pull request Jan 10, 2026
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Co-authored-by: Krishnan Prashanth <kprashanth@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
@dagil-nvidia
Copy link
Collaborator

Auto-linked to DIS-1220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants