[Bugfix] Fix getting vision features in Transformer Multimodal backend by zucchini-nlp · Pull Request #32933 · vllm-project/vllm

zucchini-nlp · 2026-01-23T11:16:21Z

Makes sure that transformers multimodal backend keeps working after v5 release.

PR huggingface/transformers#42564 changed the output of self.model.get_image_features to tuple | dict format. Prev we expected the output to always be a single tensor or a list of tensors for non-homogeneous image sizes. A simple check if the output is tuple

The default output format currently depends on model.config.return_dict, so I added both formats

cc @hmellor

Signed-off-by: raushan <raushan@huggingface.co>

gemini-code-assist

Code Review

The pull request effectively addresses the compatibility issue with the transformers library's v5 release, where the self.model.get_image_features method now returns a tuple or dict instead of a single tensor. The added logic correctly extracts the vision embeddings from these new output formats, ensuring the multimodal backend continues to function as expected. The changes are concise and directly resolve the reported bug.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

vllm/model_executor/models/transformers/multimodal.py

DarkLight1337 · 2026-01-23T11:51:52Z

Nice

vllm-project#32933) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: 陈建华 <1647430658@qq.com>

vllm-project#32933) Signed-off-by: raushan <raushan@huggingface.co>

zucchini-nlp added 2 commits January 23, 2026 12:05

get_image_feats

333a270

Signed-off-by: raushan <raushan@huggingface.co>

can be dict by default sometimes

93dbc7c

Signed-off-by: raushan <raushan@huggingface.co>

zucchini-nlp requested a review from hmellor as a code owner January 23, 2026 11:16

zucchini-nlp changed the title ~~[BugFix] Fix getting vision features in Transformer Multimodal backend~~ [Bugfix] Fix getting vision features in Transformer Multimodal backend Jan 23, 2026

mergify bot added the bug Something isn't working label Jan 23, 2026

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

cursor bot reviewed Jan 23, 2026

View reviewed changes

vllm/model_executor/models/transformers/multimodal.py Show resolved Hide resolved

DarkLight1337 approved these changes Jan 23, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) January 23, 2026 11:51

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 23, 2026

DarkLight1337 merged commit d95d650 into vllm-project:main Jan 23, 2026
60 checks passed

cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026

[Bugfix] Fix getting vision features in Transformer Multimodal backend (

aae07c0

vllm-project#32933) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: 陈建华 <1647430658@qq.com>

lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026

[Bugfix] Fix getting vision features in Transformer Multimodal backend (

ca05500

vllm-project#32933) Signed-off-by: raushan <raushan@huggingface.co>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix] Fix getting vision features in Transformer Multimodal backend (

ac67fb5

vllm-project#32933) Signed-off-by: raushan <raushan@huggingface.co>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix getting vision features in Transformer Multimodal backend#32933

[Bugfix] Fix getting vision features in Transformer Multimodal backend#32933
DarkLight1337 merged 2 commits intovllm-project:mainfrom
zucchini-nlp:get-image-features-v5-update

zucchini-nlp commented Jan 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zucchini-nlp commented Jan 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zucchini-nlp commented Jan 23, 2026 •

edited by github-actions bot

Loading