[Core] Use individual MM items in P0/P1 cache and model runner by DarkLight1337 · Pull Request #22570 · vllm-project/vllm

DarkLight1337 · 2025-08-09T15:50:31Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Follow-up to #22198 and #22457, in preparation for moving processing cache from P0 to P1.

Key changes:

MultiModalKwargsItem can now contain empty data.
The P0/P1 cache now accepts a list of MultiModalKwargsItem, and returns a new list of MultiModalKwargsItems.
EngineCoreRequest, Request, NewRequestData, CachedRequestState now use mm_kwargs: list[MultiModalKwargsItem] instead of mm_inputs: list[MultiModalKwargs]. (cc @wangxiyuan please update vllm/vllm-ascend accordingly after this PR)
Reworked merge_and_sort_multimodal_metadata -> argsort_mm_positions and group_mm_inputs_by_modality -> group_mm_kwargs_by_modality with new semantics to enhance code reuse.
Support pin_memory argument for merging MultiModalFieldElems (unused for now, see comment inside group_mm_kwargs_by_modality)

Test Plan

Test Result

(Optional) Documentation Update

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

github-actions · 2025-08-09T15:50:38Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request is a significant refactoring of how multimodal inputs are handled, moving from MultiModalKwargs per request to a list of MultiModalKwargsItem. This change is aimed at improving the design for caching and processing of multimodal data. The changes are extensive, touching many files in the core engine, workers, and tests. The tests have been updated to reflect the new logic, which is a positive sign. However, I've identified a critical issue in the new MultiModalKwargsItem.__init__ method that can lead to runtime errors with empty inputs. Additionally, there's a potential data loss bug in gpu_model_runner.py when handling raw multimodal inputs with mixed modalities, which could silently drop data. These issues should be addressed to ensure the correctness of the new implementation.

vllm/multimodal/inputs.py

vllm/v1/worker/gpu_model_runner.py

vllm/v1/worker/tpu_model_runner.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-08-09T16:05:46Z

vllm/multimodal/inputs.py


-            def _expect_same_shape(tensor: torch.Tensor):
-                return tensor.shape[:self.dim] + tensor.shape[self.dim + 1:]
+            dim = self.dim + (self.dim < 0) * len(batch[0].shape)


The extra self.dim < 0 check allows negative dim to be passed to this field

DarkLight1337 · 2025-08-09T16:07:10Z

vllm/v1/worker/gpu_input_batch.py

    def num_tokens(self) -> int:
        return self.num_prompt_tokens + len(self.output_token_ids)

+    # Temporary back-compatibility for plugins that define model runner


This fallback is determined by https://github.com/vllm-project/vllm/pull/22570/files#diff-629bb642993061658312f62ddfdfc2fabe3bf7a335eee5451e7cde5b23fbc2bbL335

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py

Overall LGTM, just leave some nits.

vllm/multimodal/inputs.py

vllm/multimodal/utils.py

vllm/multimodal/inputs.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-08-10T05:14:36Z

Added ready label just to check CI, please don't merge yet as this is pending discussion with @ywang96 @WoosukKwon

vllm/v1/worker/gpu_input_batch.py

huachenheli

Mark MultiModalKwargs class as deprecated?

DarkLight1337 · 2025-08-11T03:45:48Z

Mark MultiModalKwargs class as deprecated?

It is still used by BaseMultiModalProcessor to remain compatible with V0

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify · 2025-08-13T13:41:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

### What this PR does / why we need it? 1. update `CachedRequestState` as `NewRequestData` changed in vllm-project/vllm#22570 2. drop maintenance of vllm v0.10.0 in the branch main ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@92ff41a --------- Signed-off-by: MengqingCao <cmq0113@163.com>

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…project#2367) ### What this PR does / why we need it? 1. update `CachedRequestState` as `NewRequestData` changed in vllm-project/vllm#22570 2. drop maintenance of vllm v0.10.0 in the branch main ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@92ff41a --------- Signed-off-by: MengqingCao <cmq0113@163.com>

[Core] Use individual MM items in P0/P1 cache and model runner

ec347bf

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from Isotr0py August 9, 2025 15:50

DarkLight1337 requested review from WoosukKwon, njhill, robertgshaw2-redhat and ywang96 as code owners August 9, 2025 15:50

DarkLight1337 added this to Multi-modality Core Aug 9, 2025

DarkLight1337 requested review from alexm-redhat and comaniac as code owners August 9, 2025 15:50

DarkLight1337 moved this to In Progress in Multi-modality Core Aug 9, 2025

mergify bot added multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs labels Aug 9, 2025

gemini-code-assist bot reviewed Aug 9, 2025

View reviewed changes

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Show resolved Hide resolved

DarkLight1337 added 2 commits August 9, 2025 15:56

Address comment

c4da5dc

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Address comment

3a36adb

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 commented Aug 9, 2025

View reviewed changes

DarkLight1337 added 2 commits August 9, 2025 16:10

Assertion

d5b74ad

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Merge branch 'main' into mm-cache-item

bd05abb

Isotr0py approved these changes Aug 10, 2025

View reviewed changes

vllm/multimodal/inputs.py Show resolved Hide resolved

vllm/multimodal/utils.py Outdated Show resolved Hide resolved

vllm/multimodal/inputs.py Show resolved Hide resolved

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 10, 2025

Address comment; add back-compat

426061d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

huachenheli reviewed Aug 11, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Show resolved Hide resolved

huachenheli reviewed Aug 11, 2025

View reviewed changes

DarkLight1337 added 2 commits August 11, 2025 12:56

Merge branch 'main' into mm-cache-item

9c4406c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update type annotations and message

193f9e8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added 4 commits August 13, 2025 08:32

Merge branch 'main' into mm-cache-item

938c9d0

Fix

ca9b1c1

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix wrong types

9ce07f9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Avoid in-place updates

8e32d7e

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the needs-rebase label Aug 13, 2025

Merge branch 'main' into mm-cache-item

42fbdf8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot removed the needs-rebase label Aug 13, 2025

vllm-bot merged commit 19b927e into vllm-project:main Aug 13, 2025
5 of 13 checks passed

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Aug 13, 2025

DarkLight1337 deleted the mm-cache-item branch August 13, 2025 14:18

wwl2755-google mentioned this pull request Aug 13, 2025

[Multi-modal] Fix upstream refactor vllm-project/tpu-inference#474

Merged

MengqingCao mentioned this pull request Aug 14, 2025

[Quickfix] update CachedRequestState as NewRequestData changed vllm-project/vllm-ascend#2367

Merged

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

[Core] Use individual MM items in P0/P1 cache and model runner (vllm-…

d39a1b8

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[Core] Use individual MM items in P0/P1 cache and model runner (vllm-…

8dd4c5a

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Core] Use individual MM items in P0/P1 cache and model runner (vllm-…

e5bb088

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Core] Use individual MM items in P0/P1 cache and model runner (vllm-…

b1bdc34

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Core] Use individual MM items in P0/P1 cache and model runner (vllm-…

8570052

…project#22570) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Uh oh!

Conversation

DarkLight1337 commented Aug 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 10, 2025

Uh oh!

Uh oh!

huachenheli left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DarkLight1337 commented Aug 9, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Aug 11, 2025 •

edited

Loading