[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec by sfeng33 · Pull Request #23779 · vllm-project/vllm

sfeng33 · 2025-08-27T22:40:18Z

Purpose

This PR refactors multimodal input in V1 engine to use a unified MultiModalFeatureSpec data structure, as part of the broader effort to abstract out vllm's input processing pipeline (#22880).
Partially fix: #23872

Why This Change:

This refactor addresses the fragmented input processing issue where multimodal data (images, audio, video) was passed through multiple separate fields (mm_kwargs, mm_hashes, mm_placeholders).

Changes

Introduced MultiModalFeatureSpec: A unified dataclass that encapsulates all multimodal-related data (data, modality, identifier, position) into a single structure.
Simplified EngineCoreRequest: Replaced three separate multimodal fields with a single mm_features field and updated its references.
Note: To keep the PR small, only updated core engine and processor references. Left TODO comments for migrating
scheduler and model runner in follow-up PRs

Test Plan

python -m vllm.entrypoints.openai.api_server \
    --model llava-hf/llava-1.5-7b-hf 

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llava-hf/llava-1.5-7b-hf",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What do you see in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 100,
    "temperature": 0
  }'

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 · 2025-08-28T22:45:16Z

PTAL @ywang96 @DarkLight1337

vllm/v1/request.py

DarkLight1337

Sounds good, thanks for the cleanup!

sfeng33 · 2025-08-29T03:24:50Z

Thanks for the review!

DarkLight1337 · 2025-08-29T07:05:05Z

The test fails, PTAL

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 · 2025-08-29T07:12:31Z

The test fails, PTAL

Should be fixed now, will monitor the CI.

tests/tokenization/test_detokenize.py

Signed-off-by: sfeng33 <4florafeng@gmail.com>

…roject#23779) Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 added 3 commits August 27, 2025 13:56

Use feature spec in engines

a2cb7cc

Signed-off-by: sfeng33 <4florafeng@gmail.com>

Fix test and format

a8e34ea

Signed-off-by: sfeng33 <4florafeng@gmail.com>

comment

3104584

Signed-off-by: sfeng33 <4florafeng@gmail.com>

mergify bot added multi-modality Related to multi-modality (#4194) v1 labels Aug 27, 2025

pre-commit

857af3a

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 changed the title ~~[Multimodal] Consolidate mm inputs in MultiModalFeatureSpec in engines~~ [Multimodal] Consolidate mm inputs into MultiModalFeatureSpec Aug 27, 2025

sfeng33 added 2 commits August 27, 2025 23:37

pre-commit

0b1b826

Signed-off-by: sfeng33 <4florafeng@gmail.com>

comment

d87616f

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 marked this pull request as ready for review August 28, 2025 20:24

sfeng33 requested review from DarkLight1337, WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 28, 2025 20:24

DarkLight1337 reviewed Aug 29, 2025

View reviewed changes

vllm/v1/request.py Show resolved Hide resolved

DarkLight1337 approved these changes Aug 29, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 29, 2025 03:23

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025

Fix tests

17e4e00

Signed-off-by: sfeng33 <4florafeng@gmail.com>

auto-merge was automatically disabled August 29, 2025 07:05
Head branch was pushed to by a user without write access

Fix test

5f17340

Signed-off-by: sfeng33 <4florafeng@gmail.com>

DarkLight1337 reviewed Aug 29, 2025

View reviewed changes

tests/tokenization/test_detokenize.py Outdated Show resolved Hide resolved

Fix tests

67979e6

Signed-off-by: sfeng33 <4florafeng@gmail.com>

DarkLight1337 merged commit 69f4635 into vllm-project:main Aug 29, 2025
37 of 38 checks passed

kyuyeunk mentioned this pull request Aug 29, 2025

[Bugfix] Fix PlaceholderRange import error vllm-project/tpu-inference#613

Merged

sfeng33 deleted the renderer branch September 1, 2025 02:30

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (vllm-p…

c36ccaa

…roject#23779) Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 mentioned this pull request Sep 10, 2025

[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec #24548

Merged

Shaoting-Feng mentioned this pull request Sep 12, 2025

[Core] Update mm_hashes to the mm_feature format for compatibility with vLLM LMCache/LMCache#1582

Merged

qthequartermasterman mentioned this pull request Sep 16, 2025

Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs #24987

Merged

5 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (vllm-p…

d5da9f0

…roject#23779) Signed-off-by: sfeng33 <4florafeng@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec#23779

[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec#23779
DarkLight1337 merged 9 commits intovllm-project:mainfrom
sfeng33:renderer

sfeng33 commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

sfeng33 commented Aug 28, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

sfeng33 commented Aug 29, 2025

Uh oh!

DarkLight1337 commented Aug 29, 2025

Uh oh!

sfeng33 commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sfeng33 commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Why This Change:

Changes

Test Plan

Uh oh!

sfeng33 commented Aug 28, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

sfeng33 commented Aug 29, 2025

Uh oh!

DarkLight1337 commented Aug 29, 2025

Uh oh!

sfeng33 commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfeng33 commented Aug 27, 2025 •

edited by github-actions bot

Loading