[Core][MM] Use non-blocking CPU-GPU copy of multimodal data by lgeiger · Pull Request #28141 · vllm-project/vllm

lgeiger · 2025-11-05T17:05:24Z

Purpose

On main group_mm_kwargs_by_modality currently causes unnecessary CPU/GPU sync. This PR makes the copy non blocking to prevent this. Looks like this was missed in #25654 /cc @DarkLight1337

Test Plan

 VLLM_TORCH_PROFILER_DIR="vllm_profile" vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 --limit-mm-per-prompt.video 0 --max-model-len 10000

vllm bench serve --backend openai-chat --model Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmarena-ai/VisionArena-Chat --hf-split train --num-prompts 5 --profile

Test Result

Before:

After:

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a performance optimization by enabling non-blocking CPU-to-GPU copies for multimodal data. The change correctly adds non_blocking=True to the torch.Tensor.to() call within group_mm_kwargs_by_modality, which prevents unnecessary CPU-GPU synchronization when using pinned memory. This is a valuable improvement that aligns the behavior of the new data path with the legacy one. The code change is correct and the performance gains are demonstrated by the provided profiler results. The pull request is well-justified and ready for merging.

DarkLight1337

Good catch

…ject#28141) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

[Core][MM] Use non-blocking CPU-GPU copy of multimodal data

262417a

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger requested review from DarkLight1337, NickLucche and ywang96 as code owners November 5, 2025 17:05

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

mergify bot added the multi-modality Related to multi-modality (#4194) label Nov 5, 2025

lgeiger mentioned this pull request Nov 5, 2025

[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU #28168

Merged

ywang96 approved these changes Nov 6, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025

Merge branch 'main' into mm-gpu-sync

614ac80

ywang96 enabled auto-merge (squash) November 6, 2025 02:03

DarkLight1337 approved these changes Nov 6, 2025

View reviewed changes

ywang96 merged commit 80679f1 into vllm-project:main Nov 6, 2025
47 checks passed

lgeiger deleted the mm-gpu-sync branch November 6, 2025 15:20

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (vllm-pro…

a87183c

…ject#28141) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (vllm-pro…

fea78d2

…ject#28141) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][MM] Use non-blocking CPU-GPU copy of multimodal data#28141

[Core][MM] Use non-blocking CPU-GPU copy of multimodal data#28141
ywang96 merged 2 commits intovllm-project:mainfrom
lgeiger:mm-gpu-sync

lgeiger commented Nov 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lgeiger commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lgeiger commented Nov 5, 2025 •

edited by github-actions bot

Loading