-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Migrate MiniCPMVImageInputs to TensorSchema #21939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR migrates MiniCPMVImageInputs to TensorSchema. I've provided comments on the TensorShape definitions and a missing check that could cause regressions for variable-sized inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TensorShape for pixel_values does not correctly handle variable image sizes, which could lead to validation failures for valid inputs. The height (h) and width (w) dimensions can differ between tensors in the pixel_values list. Without marking h and w as dynamic, TensorSchema validation will require all tensors in the list to have the same shape. This is a regression from the previous TypedDict-based implementation and would cause validation to fail for valid inputs with varying image sizes.
pixel_values: Annotated[
list[torch.Tensor],
TensorShape("bns", 3, "h", "w", dynamic_dims={"h", "w"}),
]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're fine to leave as is since this wasn't enforced and values aren't used in any symbolic cross field checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TensorShape for image_embeds does not account for a variable number of slices, which could lead to validation failures. The TensorShape does not seem to account for a variable number of slices (ns) when image_embeds is a list of tensors. Each tensor in the list corresponds to an image and has a shape of (num_slices, hidden_size). If num_slices can vary per image, this will cause validation to fail. To support this, ns should be marked as a dynamic dimension.
image_embeds: Annotated[
Union[torch.Tensor, list[torch.Tensor]],
TensorShape("bn", "ns", "hs", dynamic_dims={"ns"}),
]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't enforced previously, so I haven't added any validations. Feel free to let me know if we should make ns dynamic dim.
4fe5136 to
a6adcd8
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Purpose
This PR migrates MiniCPMVImageInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
Test Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result