[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs by shen-shanshan · Pull Request #40355 · vllm-project/vllm

shen-shanshan · 2026-04-20T12:32:42Z

Purpose

Following #35963 and #38061, we have supported ViT CUDA graph image/video inference for Qwen3-VL respectively. Now we have demonstrated that it's also compatible for mixed (image+video) inputs (per prompt).

Since we have grouped and batched multi-modal inputs in _execute_mm_encoder() through group_and_batch_mm_kwargs(), each call for model.embed_multimodal() or encoder_cudagraph_manager.execute() will only contain single modality.

# vllm/vllm/v1/worker/gpu_model_runner.py
def _execute_mm_encoder(...):
    # ...
    for modality, num_items, mm_kwargs_batch in group_and_batch_mm_kwargs(...):
        # ...
        if enable_vit_cuda_graph:
            batch_outputs = self.encoder_cudagraph_manager.execute(mm_kwargs_batch)
        else:
            batch_outputs = model.embed_multimodal(**mm_kwargs_batch)


# vllm/vllm/multimodal/utils.py
def group_and_batch_mm_kwargs(mm_kwargs, ...):
    for modality, group in groupby(mm_kwargs, key=lambda x: x[0]):
        ...
        for num_items, mm_kwargs_batch in group_and_batch_mm_items(...):
            yield modality, num_items, mm_kwargs_batch

Thus, mixed inputs will be separated to different ViT processes (i.e., it's compatible for our CUDA graph implementation).

Test Plan

Based on #40335.

# Pass compilation_config to EngineArgs in run_qwen3_vl()
# compilation_config={
#     "cudagraph_mm_encoder": True,
#     "encoder_cudagraph_token_budgets": [512, 1024, 1536, 2048, 2560, 3072, 3584, 4096, 4864],
#     "encoder_cudagraph_max_vision_items_per_batch": 8,
#     "encoder_cudagraph_max_frames_per_batch": 64,
# }
python examples/offline_inference/vision_language.py -m qwen3_vl --modality "image+video"

Test Result

--------------------------------------------------
The image shows a baby girl sitting on a bed, wearing glasses and reading a book. She is focused on the book, turning the pages with her small hands. The room appears to be a bedroom, with a crib and some clothes visible in the background.

In the video, the baby girl continues to read the book, occasionally looking up and smiling. She seems to be enjoying her reading time, and her glasses add a touch of charm to her appearance. The video captures a sweet and innocent moment of a child engaging in a quiet activity.
--------------------------------------------------
The image shows a baby girl sitting on a bed, wearing glasses, and reading a book.

The video shows the same baby girl continuing to read the book, turning the pages, and occasionally looking up. She appears to be very focused on the book and is enjoying her reading time.
--------------------------------------------------
The image shows a baby girl sitting on a bed, wearing glasses, and reading a book. She is focused on the book, turning the pages with her small hands. The background includes a crib and some clothes, suggesting a cozy and comfortable setting.

In the video, the baby girl continues to read the book, occasionally looking up and smiling. She seems to be enjoying the activity and is fully engaged in the story. The video captures the innocence and curiosity of a young child exploring the world of books.
--------------------------------------------------
The image shows a baby sitting on a bed, wearing glasses, and reading a book. The baby is dressed in a light blue shirt and pink pants. The background includes a crib and some clothes on the bed.

In the video, the baby continues to read the book, turning the pages and occasionally looking up. The baby seems to be enjoying the activity and is focused on the book. The video captures the baby's concentration and curiosity as they explore the book.
--------------------------------------------------

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: shen-shanshan <467638484@qq.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-04-20T12:33:27Z

Documentation preview: https://vllm--40355.org.readthedocs.build/en/40355/

gemini-code-assist

Code Review

This pull request updates the documentation for CUDA graphs in multimodal models to reflect that video inference support is no longer experimental and that mixed image and video inputs per prompt are now supported. Examples and notes recommending the limitation of multi-modal inputs have been removed accordingly. I have no feedback to provide as there were no review comments.

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com>

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Yifan <yzong@redhat.com>

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Adrian <info@zzit.ch>

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

update doc for vit cg mixed inputs

e700f2c

Signed-off-by: shen-shanshan <467638484@qq.com>

claude Bot reviewed Apr 20, 2026

View reviewed changes

mergify Bot added documentation Improvements or additions to documentation nvidia labels Apr 20, 2026

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

ZJY0516 requested a review from Isotr0py April 20, 2026 13:17

shen-shanshan mentioned this pull request Apr 21, 2026

[RFC]: Support ViT Full CUDA Graph (Tracker) #38175

Open

20 tasks

Isotr0py approved these changes Apr 21, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) April 21, 2026 02:29

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 21, 2026

github-project-automation Bot added this to NVIDIA Apr 21, 2026

github-project-automation Bot moved this to Ready in NVIDIA Apr 21, 2026

Isotr0py merged commit 8097591 into vllm-project:main Apr 21, 2026
13 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 21, 2026

baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026

[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs (vllm-…

f53d563

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com>

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 23, 2026

[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs (vllm-…

f951e2c

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Yifan <yzong@redhat.com>

Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026

[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs (vllm-…

e4093b5

…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Adrian <info@zzit.ch>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs#40355

[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs#40355
Isotr0py merged 1 commit intovllm-project:mainfrom
shen-shanshan:doc

shen-shanshan commented Apr 20, 2026 •

edited by github-actions Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shen-shanshan commented Apr 20, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shen-shanshan commented Apr 20, 2026 •

edited by github-actions Bot

Loading