fix: Fix decode worker in vllm for qwen_vl models by indrajit96 · Pull Request #5281 · ai-dynamo/dynamo

indrajit96 · 2026-01-08T17:59:04Z

Overview:

Fix disaggregated multimodal decode for Qwen2.5-VL models by passing the original unexpanded prompt to the decode worker, allowing vLLM to expand it identically to prefill and fix crash.

Details:

Problem: Qwen2.5-VL disaggregated decode was failing with:
IndexError: list index out of range when no multimodal data was passed (mRoPE needs image_grid_thw)

Solution:

Decode worker: Pass multi_modal_data with zero embeddings and image_grid_thw for mRoPE position calculation
PD worker: For Qwen models, keep the original unexpanded prompt (with placeholders) instead of using the expanded prompt from prefill

Where should the reviewer start?

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py - Main logic changes in MultimodalDecodeWorkerHandler and MultimodalPDWorkerHandler
components/src/dynamo/vllm/multimodal_utils/model.py - New construct_qwen_decode_mm_data() function

Summary by CodeRabbit

Release Notes

New Features
- Added support for Qwen Vision-Language models with enhanced multimodal embeddings processing.
- Improved handling of multimodal data for vision-language model inference.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

coderabbitai · 2026-01-08T18:01:12Z

Walkthrough

The changes implement Qwen VL multimodal decoding support by introducing a new utility function to construct decode-time multimodal data structures and integrating it into the worker handler with model-specific prompt token handling logic.

Changes

Cohort / File(s)	Summary
Qwen VL Multimodal Decode Support `components/src/dynamo/vllm/multimodal_handlers/worker_handler.py`, `components/src/dynamo/vllm/multimodal_utils/model.py`	Adds `construct_qwen_decode_mm_data()` function to build zero-initialized multimodal data tensors; integrates into worker handler with conditional prompt token handling for mRoPE-based Qwen models versus non-Qwen models
Deployment Configuration `examples/backends/vllm/launch/disagg_multimodal_epd.sh`	Enables multimodal embeddings in decode worker by passing `--enable-mm-embeds` flag

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Qwen VL hops into decode,
Zero tensors light the load,
mRoPE prompts dance untamed and free,
Embeddings flow, let visions be,
Multimodal magic—enabled to go! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: Fix decode worker in vllm for qwen_vl models' directly and specifically describes the main change: fixing the decode worker for Qwen VL models in vLLM, which aligns with the changeset's core objective.
Description check	✅ Passed	The PR description provides a clear overview, detailed explanation of the problem and solution, and identifies specific files for review. All key sections from the template are present and substantially filled out.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @components/src/dynamo/vllm/multimodal_utils/model.py:
- Around line 182-187: The signature for construct_qwen_decode_mm_data uses
embeddings_shape: Optional[Any>, which is too loose; tighten it to a more
specific sequence type such as Optional[Tuple[int, ...]] or
Optional[Sequence[int]] (and import Tuple/Sequence from typing) so callers must
pass a valid shape (e.g., (N, C, H, W)). Update the function annotation to
embeddings_shape: Optional[Tuple[int, ...]] (or Optional[Sequence[int]]) and
adjust any internal code or tests if they relied on Any, ensuring any runtime
checks that treat embeddings_shape as an indexable shape remain valid.

🧹 Nitpick comments (1)

components/src/dynamo/vllm/multimodal_utils/model.py (1)
199-206: Add validation for embeddings_shape and ndim before squeezing.

The function doesn't validate that embeddings_shape is a valid shape (e.g., a sequence of integers), which could lead to cryptic PyTorch errors. Additionally, the squeeze operation assumes a 3D tensor but only checks ndim == 3 after creating the tensor—consider validating the shape length beforehand.
♻️ Proposed validation improvements
     if image_grid_thw is None or len(image_grid_thw) == 0:
         raise ValueError("No image grid provided for Qwen model.")
     if embeddings_shape is None:
         raise ValueError("embeddings_shape is required for Qwen decode mm data.")
+    if not isinstance(embeddings_shape, (tuple, list)) or not all(isinstance(x, int) and x > 0 for x in embeddings_shape):
+        raise ValueError(f"embeddings_shape must be a tuple or list of positive integers, got {embeddings_shape}")
 
     image_embeds = torch.zeros(embeddings_shape, dtype=dtype, device="cpu")
-    if image_embeds.ndim == 3:
+    if len(embeddings_shape) == 3:
         image_embeds = image_embeds.squeeze(0)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfb95df and a45a8ab.

📒 Files selected for processing (3)

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py
components/src/dynamo/vllm/multimodal_utils/model.py
examples/backends/vllm/launch/disagg_multimodal_epd.sh

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2026-01-04T06:45:28.414Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 5153
File: examples/backends/vllm/launch/lora/setup_minio.sh:99-109
Timestamp: 2026-01-04T06:45:28.414Z
Learning: For HuggingFace CLI version reporting (hf version and huggingface-cli version) in v0.34.6 and later, use direct argument syntax instead of the --version flag. Review shell-script changes and any scripts invoking the HuggingFace CLI to ensure they call the version output with a direct argument (e.g., 'hf version' or equivalent) rather than using '--version'. Apply to shell scripts and any related CLI invocations in the repository.

Applied to files:

examples/backends/vllm/launch/disagg_multimodal_epd.sh

📚 Learning: 2025-10-28T04:09:48.264Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py

🪛 Ruff (0.14.10)

components/src/dynamo/vllm/multimodal_utils/model.py

200-200: Avoid specifying long messages outside the exception class

(TRY003)

202-202: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)

examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

96-96: LGTM! Decode worker now consistent with prefill configuration.

Adding --enable-mm-embeds to the decode worker aligns with the prefill worker configuration (line 91) and supports the multimodal decode functionality introduced in this PR.

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py (2)

67-85: Well-documented Qwen-specific multimodal handling.

The approach of passing zero embeddings with image_grid_thw for mRoPE position calculation is clearly explained. The conditional logic ensures Qwen VL models receive the necessary multimodal data while maintaining backward compatibility. The multi_modal_data parameter is correctly typed as NotRequired[Optional[Any]] (per the type override in multimodal_utils/protocol.py), so passing None for non-Qwen models is safe.

273-283: Conditional prompt handling logic for disaggregated multimodal decode is correctly implemented.

The code properly distinguishes between Qwen VL models and others:

Qwen VL: Preserves original prompt for decode worker to expand with multi_modal_data, matching prefill expansion and ensuring block count alignment

Non-Qwen: Uses expanded prompt from prefill response where vLLM won't expand further, matching the KV cache layout

The kv_transfer_params are correctly propagated to maintain block synchronization between prefill and decode workers.

components/src/dynamo/vllm/multimodal_utils/model.py

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py

components/src/dynamo/vllm/multimodal_utils/model.py

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com> Co-authored-by: Krishnan Prashanth <kprashanth@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com> Co-authored-by: Krishnan Prashanth <kprashanth@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com>

dagil-nvidia · 2026-01-27T00:56:42Z

Auto-linked to DIS-1220

KrishnanPrash and others added 4 commits January 3, 2026 22:48

fix: pass Qwen image_grid_thw on decode

9caac4c

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

fix: pass Qwen mm fields on disagg decode

e76366e

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Removing unrelated script

29e15d2

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Update Decode Prompt for Qwen models

b6fffc9

indrajit96 requested review from GuanLuo, KrishnanPrash and rmccorm4 January 8, 2026 17:59

indrajit96 requested review from a team as code owners January 8, 2026 17:59

pull-request-size bot added the size/M label Jan 8, 2026

Merge branch 'main' into kprashanth/qwen-epd-vllm

a45a8ab

copy-pr-bot bot temporarily deployed to GITLAB January 8, 2026 18:00 Inactive

indrajit96 changed the title ~~Fix decode worker in vllm for qwen_vl models~~ fix: Fix decode worker in vllm for qwen_vl models Jan 8, 2026

github-actions bot added the fix label Jan 8, 2026

copy-pr-bot bot temporarily deployed to GITLAB January 8, 2026 18:00 Inactive

indrajit96 requested a review from furionw January 8, 2026 18:02

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

components/src/dynamo/vllm/multimodal_utils/model.py Show resolved Hide resolved

rmccorm4 reviewed Jan 8, 2026

View reviewed changes

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py Show resolved Hide resolved

GuanLuo reviewed Jan 8, 2026

View reviewed changes

components/src/dynamo/vllm/multimodal_utils/model.py Outdated Show resolved Hide resolved

rmccorm4 added backend::vllm Relates to the vllm backend multimodal labels Jan 8, 2026

Fix prefix caching on decode worker

0084f57

copy-pr-bot bot temporarily deployed to GITLAB January 9, 2026 01:38 Inactive

copy-pr-bot bot temporarily deployed to GITLAB January 9, 2026 01:39 Inactive

indrajit96 requested review from GuanLuo and rmccorm4 January 9, 2026 02:11

Merge branch 'main' into kprashanth/qwen-epd-vllm

776854b

copy-pr-bot bot temporarily deployed to GITLAB January 9, 2026 02:12 Inactive

copy-pr-bot bot temporarily deployed to GITLAB January 9, 2026 02:13 Inactive

GuanLuo approved these changes Jan 9, 2026

View reviewed changes

indrajit96 merged commit 5cd8005 into main Jan 9, 2026
27 of 28 checks passed

indrajit96 deleted the kprashanth/qwen-epd-vllm branch January 9, 2026 20:51

indrajit96 mentioned this pull request Jan 9, 2026

fix: Fix decode worker in vllm for qwen_vl models (#5281) #5327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix decode worker in vllm for qwen_vl models#5281

fix: Fix decode worker in vllm for qwen_vl models#5281
indrajit96 merged 7 commits intomainfrom
kprashanth/qwen-epd-vllm

indrajit96 commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

indrajit96 commented Jan 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

indrajit96 commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading