Migrate Phi4 inputs to TensorSchema by bbeckca · Pull Request #23471 · vllm-project/vllm

bbeckca · 2025-08-23T23:42:36Z

Purpose

This PR migrates Phi4MMImagePixelInputs, Phi4MMAudioFeatureInputs, Phi4MMAudioEmbeddingInputs, Phi4MMImageEmbeddingInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.

Test Plan

Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.

Test Result

(venv) benjibeck@Benjis-MacBook-Pro vllm % python3 -m pytest tests/utils_/test_tensor_schema.py -v --log-cli-level=DEBUG
============================================================================================ test session starts =============================================================================================
platform darwin -- Python 3.9.6, pytest-8.4.1, pluggy-1.6.0 -- /Users/benjibeck/Projects/vllm/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/benjibeck/Projects/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 19 items                                                                                                                                                                                           

tests/utils_/test_tensor_schema.py::test_tensor_schema_valid_tensor PASSED                                                                                                                             [  5%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_optional_fields PASSED                                                                                                                          [ 10%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_constant_dim_failure PASSED                                                                                                                     [ 15%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_invalid_types_in_list PASSED                                                                                                                    [ 21%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_rank_mismatch PASSED                                                                                                                            [ 26%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_missing_required_field PASSED                                                                                                                   [ 31%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_symbolic_dim_mismatch PASSED                                                                                                                    [ 36%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_list_tensor_valid PASSED                                                                                                                        [ 42%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_variable_patch_counts_valid PASSED                                                                                                              [ 47%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_tuple_tensor_valid PASSED                                                                                                                       [ 52%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_inconsistent_shapes_in_list PASSED                                                                                                              [ 57%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_empty_list PASSED                                                                                                                               [ 63%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_validation_disabled_skips_shape_check PASSED                                                                                                    [ 68%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_valid_resolve_binding_dims PASSED                                                                                                          [ 73%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_invalid_resolve_binding_dims PASSED                                                                                                        [ 78%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim PASSED                                                                                                                [ 84%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim_mismatch_in_length PASSED                                                                                             [ 89%]
tests/utils_/test_tensor_schema.py::test_valid_tensor_schema_with_static_last_dim PASSED                                                                                                               [ 94%]
tests/utils_/test_tensor_schema.py::test_invalid_tensor_schema_with_static_last_dim PASSED                                                                                                             [100%]

gemini-code-assist

Code Review

This pull request successfully migrates Phi-4 multi-modal input definitions from TypedDict to the more structured TensorSchema, enhancing input validation and code clarity. The changes also include some useful bug fixes in error messages. My review identifies a few critical correctness issues related to the handling and parsing of image inputs, particularly image embeddings, in both phi4_multimodal.py and phi4mm.py. While some of these issues may predate this PR, addressing them is crucial for the correct functioning of the multi-modal capabilities.

gemini-code-assist · 2025-08-23T23:45:05Z

vllm/model_executor/models/phi4_multimodal.py

The usage of this Phi4MMImageEmbeddingInputs class in _process_image_input seems to have a few issues:

The type hint for image_input in _process_image_input is Phi4MMImagePixelInputs, but it should be Phi4MMImageInput to handle both pixel values and embeddings.

It accesses image_input["image_embeds"], but the field name in this class is data. This will cause a KeyError.

It uses self.visual.dtype, but self.visual is not defined in Phi4MultimodalForCausalLM. This will cause an AttributeError.

While these issues might not be introduced in this PR, they are related to the data structures being modified and affect correctness.

This might be worth reviewing, but it beyond the scope of this diff and seems better served as a follow up task.

gemini-code-assist · 2025-08-23T23:45:05Z

vllm/model_executor/models/phi4_multimodal.py

This function _parse_and_validate_image_input seems to only handle image_pixel_values. However, _parse_and_validate_multimodal_inputs calls it for both image_pixel_values and image_embeds. If only image_embeds are provided, this function will return None, and the embeddings will be ignored.

This function should be updated to handle image_embeds as well, similar to how _parse_and_validate_audio_input handles both features and embeddings. The return type hint should also be updated from Optional[Phi4MMImagePixelInputs] to Optional[Phi4MMImageInput].

gemini-code-assist · 2025-08-23T23:45:05Z

vllm/model_executor/models/phi4mm.py

There seems to be a naming inconsistency here. The function _parse_and_validate_image_input gets input_image_embeds from kwargs, but it seems to be treating them as pixel values and returns a Phi4MMImagePixelInputs object.

This inconsistency propagates. For example, _get_mm_fields_config in Phi4MMMultiModalProcessor expects input_image_embeds, but the HF processor is likely to produce image_pixel_values. This could lead to a KeyError at runtime.

It seems input_image_embeds should be renamed to image_pixel_values throughout the image processing path in this file for clarity and correctness.

I'm just fixing a typo. Similarly, this might be worth reviewing as a follow up task but trying to keep changes in this PR focused on the migration.

vllm/model_executor/models/phi4_multimodal.py

bbeckca · 2025-08-25T15:04:00Z

Observing failing MM test for:

models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Phi4MultimodalForCausalLM-microsoft/Phi-4-multimodal-instruct]
[2025-08-25T03:28:32Z] (EngineCore_0 pid=9135) ValueError: image_attention_mask has rank 4 but expected 3

Based on existing schema, this seems to be an issue with the inputs:
schema
tensor_schema

Will find time to investigate further.

Signed-off-by: Benji Beck <benjibeck@meta.com>

…hi4MMAudioFeatureInputs.data Signed-off-by: Benji Beck <benjibeck@meta.com>

Signed-off-by: Benji Beck <benjibeck@meta.com>

bbeckca force-pushed the phi4 branch from 0c8f038 to b24f567 Compare August 23, 2025 23:43

bbeckca changed the title ~~Migrate Phi4 multi-modal inputs to TensorSchema~~ Migrate Phi4 inputs to TensorSchema Aug 23, 2025

gemini-code-assist bot reviewed Aug 23, 2025

View reviewed changes

DarkLight1337 reviewed Aug 24, 2025

View reviewed changes

vllm/model_executor/models/phi4_multimodal.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Aug 25, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 25, 2025 02:20

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2025

This was referenced Aug 25, 2025

Migrate Qwen2 inputs to TensorSchema #23475

Merged

Migrate Llama4ImagePatchInputs to TensorSchema #22021

Merged

auto-merge was automatically disabled August 31, 2025 17:11
Head branch was pushed to by a user without write access

bbeckca force-pushed the phi4 branch from 47d4fdc to 206b70c Compare August 31, 2025 17:14

bbeckca added 4 commits August 31, 2025 13:08

Migrate Phi4 multi-modal inputs to TensorSchema

14f6c61

Signed-off-by: Benji Beck <benjibeck@meta.com>

Update annotation for image_sizes

494fefa

Signed-off-by: Benji Beck <benjibeck@meta.com>

Fix precommit

bb24fd0

Signed-off-by: Benji Beck <benjibeck@meta.com>

Fix annotations for Phi4MMImagePixelInputs.image_attention_mask and P…

67c6529

…hi4MMAudioFeatureInputs.data Signed-off-by: Benji Beck <benjibeck@meta.com>

bbeckca force-pushed the phi4 branch from 206b70c to 67c6529 Compare August 31, 2025 20:08

Merge branch 'main' into phi4

88b389f

DarkLight1337 merged commit 437c3ce into vllm-project:main Sep 1, 2025
41 checks passed

didier-durand pushed a commit to didier-durand/vllm that referenced this pull request Sep 1, 2025

Migrate Phi4 inputs to TensorSchema (vllm-project#23471)

b8cd588

Signed-off-by: Benji Beck <benjibeck@meta.com>

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

Migrate Phi4 inputs to TensorSchema (vllm-project#23471)

df9cb9f

Signed-off-by: Benji Beck <benjibeck@meta.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Migrate Phi4 inputs to TensorSchema (vllm-project#23471)

e7e88db

Signed-off-by: Benji Beck <benjibeck@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate Phi4 inputs to TensorSchema#23471

Migrate Phi4 inputs to TensorSchema#23471
DarkLight1337 merged 5 commits intovllm-project:mainfrom
bbeckca:phi4

bbeckca commented Aug 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 23, 2025

Uh oh!

bbeckca Aug 24, 2025

Uh oh!

gemini-code-assist bot Aug 23, 2025

Uh oh!

bbeckca Aug 24, 2025

Uh oh!

gemini-code-assist bot Aug 23, 2025

Uh oh!

bbeckca Aug 24, 2025

Uh oh!

Uh oh!

bbeckca commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bbeckca commented Aug 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bbeckca commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bbeckca commented Aug 23, 2025 •

edited by github-actions bot

Loading