[Model] Move multimodal_cpu_fields definition to field config#30181
[Model] Move multimodal_cpu_fields definition to field config#30181DarkLight1337 merged 3 commits intovllm-project:mainfrom
multimodal_cpu_fields definition to field config#30181Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Code Review
This pull request refactors the mechanism for specifying CPU-only multimodal fields by introducing a keep_on_cpu flag in MultiModalFieldConfig and deprecating the old multimodal_cpu_fields attribute. The changes are well-implemented and consistently applied across model definitions and the core multimodal input processing logic. This improves the API by making the configuration more explicit and localized. I have one suggestion to improve a developer-facing error message.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
| if device is not None and self.keep_on_cpu: | ||
| device = "cpu" |
There was a problem hiding this comment.
Should we instead not call _nested_tensors_h2d at all if the tensor should stay on CPU? Just wondering since we set non_blocking=True in the copy which can lead to problems when transferring to CPU. Might be not an issue in practice since the tensors always are already on CPU, but might be nicer to be safe here.
There was a problem hiding this comment.
That is true, we haven't handled the case where the original tensors are on GPU and we want to potentially move them to CPU (though I don't expect that to be needed any time soon). Feel free to open a PR if you have time!
|
Nice! Thanks for adding it to |
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>
…-project#30181) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Purpose
Redesign of #28168, we now define the CPU fields in the field config where they really belong.
To avoid mixins, I had to make
MultiModalFieldConfigkw_only=Trueand update the serialization accordingly to use dicts instead of tuples - this leads to ~10 more serialized bytes per item which is very small in comparison to the tensor data.Since GLM4V uses the field config from Qwen2-VL model, I also updated it to support CPU fields.
cc @lgeiger
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.