Skip to content

[Model] Move multimodal_cpu_fields definition to field config#30181

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
DarkLight1337:move-cpu-fields
Dec 6, 2025
Merged

[Model] Move multimodal_cpu_fields definition to field config#30181
DarkLight1337 merged 3 commits intovllm-project:mainfrom
DarkLight1337:move-cpu-fields

Conversation

@DarkLight1337
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 commented Dec 6, 2025

Purpose

Redesign of #28168, we now define the CPU fields in the field config where they really belong.

To avoid mixins, I had to make MultiModalFieldConfig kw_only=True and update the serialization accordingly to use dicts instead of tuples - this leads to ~10 more serialized bytes per item which is very small in comparison to the tensor data.

Since GLM4V uses the field config from Qwen2-VL model, I also updated it to support CPU fields.

cc @lgeiger

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025
@DarkLight1337 DarkLight1337 added the multi-modality Related to multi-modality (#4194) label Dec 6, 2025
@DarkLight1337 DarkLight1337 moved this to In Progress in Multi-modality Core Dec 6, 2025
@mergify mergify bot added qwen Related to Qwen models v1 tpu Related to Google TPUs labels Dec 6, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the mechanism for specifying CPU-only multimodal fields by introducing a keep_on_cpu flag in MultiModalFieldConfig and deprecating the old multimodal_cpu_fields attribute. The changes are well-implemented and consistently applied across model definitions and the core multimodal input processing logic. This improves the API by making the configuration more explicit and localized. I have one suggestion to improve a developer-facing error message.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 6, 2025 09:03
@DarkLight1337 DarkLight1337 merged commit 671427e into vllm-project:main Dec 6, 2025
62 checks passed
@DarkLight1337 DarkLight1337 deleted the move-cpu-fields branch December 6, 2025 13:40
@github-project-automation github-project-automation bot moved this from In Progress to Done in Multi-modality Core Dec 6, 2025
Comment on lines +426 to +427
if device is not None and self.keep_on_cpu:
device = "cpu"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead not call _nested_tensors_h2d at all if the tensor should stay on CPU? Just wondering since we set non_blocking=True in the copy which can lead to problems when transferring to CPU. Might be not an issue in practice since the tensors always are already on CPU, but might be nicer to be safe here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, we haven't handled the case where the original tensors are on GPU and we want to potentially move them to CPU (though I don't expect that to be needed any time soon). Feel free to open a PR if you have time!

@lgeiger
Copy link
Copy Markdown
Contributor

lgeiger commented Dec 8, 2025

Nice! Thanks for adding it to MultiModalFieldConfig I thought about doing that in the first place as well, thanks for updating. I think this is a better API

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Dec 15, 2025
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…-project#30181)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
1. fix vllm-project/vllm#27938
2. fix vllm-project/vllm#27145
pooling models now supports chunked prefill and prefix caching,
3. fix vllm-project/vllm#30181
define the CPU fields in the field config where they really belong.
4. fix vllm-project/vllm#28168
define the CPU fields in the field config where they really belong.
5. fix vllm-project/vllm#30201
some moudle rename
6. fix vllm-project/vllm#29067
fusedmoe moudle refactor
7. fix vllm-project/vllm#29066
fusedmoe moudle refactor
8. fix vllm-project/vllm#29624
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants