Skip to content

Migrate Phi4 inputs to TensorSchema#23471

Merged
DarkLight1337 merged 5 commits intovllm-project:mainfrom
bbeckca:phi4
Sep 1, 2025
Merged

Migrate Phi4 inputs to TensorSchema#23471
DarkLight1337 merged 5 commits intovllm-project:mainfrom
bbeckca:phi4

Conversation

@bbeckca
Copy link
Contributor

@bbeckca bbeckca commented Aug 23, 2025

Purpose

This PR migrates Phi4MMImagePixelInputs, Phi4MMAudioFeatureInputs, Phi4MMAudioEmbeddingInputs, Phi4MMImageEmbeddingInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.

Test Plan

Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.

Test Result

(venv) benjibeck@Benjis-MacBook-Pro vllm % python3 -m pytest tests/utils_/test_tensor_schema.py -v --log-cli-level=DEBUG
============================================================================================ test session starts =============================================================================================
platform darwin -- Python 3.9.6, pytest-8.4.1, pluggy-1.6.0 -- /Users/benjibeck/Projects/vllm/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/benjibeck/Projects/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 19 items                                                                                                                                                                                           

tests/utils_/test_tensor_schema.py::test_tensor_schema_valid_tensor PASSED                                                                                                                             [  5%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_optional_fields PASSED                                                                                                                          [ 10%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_constant_dim_failure PASSED                                                                                                                     [ 15%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_invalid_types_in_list PASSED                                                                                                                    [ 21%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_rank_mismatch PASSED                                                                                                                            [ 26%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_missing_required_field PASSED                                                                                                                   [ 31%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_symbolic_dim_mismatch PASSED                                                                                                                    [ 36%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_list_tensor_valid PASSED                                                                                                                        [ 42%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_variable_patch_counts_valid PASSED                                                                                                              [ 47%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_tuple_tensor_valid PASSED                                                                                                                       [ 52%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_inconsistent_shapes_in_list PASSED                                                                                                              [ 57%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_empty_list PASSED                                                                                                                               [ 63%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_validation_disabled_skips_shape_check PASSED                                                                                                    [ 68%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_valid_resolve_binding_dims PASSED                                                                                                          [ 73%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_invalid_resolve_binding_dims PASSED                                                                                                        [ 78%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim PASSED                                                                                                                [ 84%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim_mismatch_in_length PASSED                                                                                             [ 89%]
tests/utils_/test_tensor_schema.py::test_valid_tensor_schema_with_static_last_dim PASSED                                                                                                               [ 94%]
tests/utils_/test_tensor_schema.py::test_invalid_tensor_schema_with_static_last_dim PASSED                                                                                                             [100%]

@bbeckca bbeckca changed the title Migrate Phi4 multi-modal inputs to TensorSchema Migrate Phi4 inputs to TensorSchema Aug 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates Phi-4 multi-modal input definitions from TypedDict to the more structured TensorSchema, enhancing input validation and code clarity. The changes also include some useful bug fixes in error messages. My review identifies a few critical correctness issues related to the handling and parsing of image inputs, particularly image embeddings, in both phi4_multimodal.py and phi4mm.py. While some of these issues may predate this PR, addressing them is crucial for the correct functioning of the multi-modal capabilities.

Comment on lines 655 to 669
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The usage of this Phi4MMImageEmbeddingInputs class in _process_image_input seems to have a few issues:

  1. The type hint for image_input in _process_image_input is Phi4MMImagePixelInputs, but it should be Phi4MMImageInput to handle both pixel values and embeddings.
  2. It accesses image_input["image_embeds"], but the field name in this class is data. This will cause a KeyError.
  3. It uses self.visual.dtype, but self.visual is not defined in Phi4MultimodalForCausalLM. This will cause an AttributeError.

While these issues might not be introduced in this PR, they are related to the data structures being modified and affect correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be worth reviewing, but it beyond the scope of this diff and seems better served as a follow up task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function _parse_and_validate_image_input seems to only handle image_pixel_values. However, _parse_and_validate_multimodal_inputs calls it for both image_pixel_values and image_embeds. If only image_embeds are provided, this function will return None, and the embeddings will be ignored.

This function should be updated to handle image_embeds as well, similar to how _parse_and_validate_audio_input handles both features and embeddings. The return type hint should also be updated from Optional[Phi4MMImagePixelInputs] to Optional[Phi4MMImageInput].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There seems to be a naming inconsistency here. The function _parse_and_validate_image_input gets input_image_embeds from kwargs, but it seems to be treating them as pixel values and returns a Phi4MMImagePixelInputs object.

This inconsistency propagates. For example, _get_mm_fields_config in Phi4MMMultiModalProcessor expects input_image_embeds, but the HF processor is likely to produce image_pixel_values. This could lead to a KeyError at runtime.

It seems input_image_embeds should be renamed to image_pixel_values throughout the image processing path in this file for clarity and correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just fixing a typo. Similarly, this might be worth reviewing as a follow up task but trying to keep changes in this PR focused on the migration.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 25, 2025 02:20
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2025
@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 25, 2025

Observing failing MM test for:

models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Phi4MultimodalForCausalLM-microsoft/Phi-4-multimodal-instruct]
[2025-08-25T03:28:32Z] (EngineCore_0 pid=9135) ValueError: image_attention_mask has rank 4 but expected 3

Based on existing schema, this seems to be an issue with the inputs:
schema
tensor_schema

Will find time to investigate further.

auto-merge was automatically disabled August 31, 2025 17:11

Head branch was pushed to by a user without write access

Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
…hi4MMAudioFeatureInputs.data

Signed-off-by: Benji Beck <benjibeck@meta.com>
@DarkLight1337 DarkLight1337 merged commit 437c3ce into vllm-project:main Sep 1, 2025
41 checks passed
didier-durand pushed a commit to didier-durand/vllm that referenced this pull request Sep 1, 2025
Signed-off-by: Benji Beck <benjibeck@meta.com>
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: Benji Beck <benjibeck@meta.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Benji Beck <benjibeck@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants