[Misc] Various cleanups for MM input processing#29970
[Misc] Various cleanups for MM input processing#29970DarkLight1337 merged 7 commits intovllm-project:mainfrom
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
Documentation preview: https://vllm--29970.org.readthedocs.build/en/29970/ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable cleanups for multimodal inputs. The addition of the tensor2base64 utility function is a great improvement, simplifying code that serializes tensors to base64 and reducing duplication across documentation, examples, and tests. The removal of the unused tokenizer argument from parse_chat_messages and parse_chat_messages_futures cleans up the API nicely. I also appreciate the fix for the Terratorch test and the addition of missing type annotations.
I have one minor suggestion to improve the new utility function further. Overall, this is a solid set of changes that improves code quality and maintainability.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several cleanups and fixes related to multi-modal inputs. A new convenience function tensor2base64 is added to simplify tensor serialization, and it's consistently used across documentation, examples, and tests. A significant refactoring removes the unused tokenizer argument from MultiModalItemTracker and related functions, which cleans up the API. The PR also includes fixes for incorrect type annotations in multi-modal data parsers and corrects a test that was improperly marked as asynchronous. Overall, these changes improve code quality, consistency, and correctness. The implementation looks solid, and I have no further suggestions.
| buf.seek(0) | ||
| binary_data = buf.read() | ||
|
|
||
| return base64.b64encode(binary_data).decode("utf-8") |
There was a problem hiding this comment.
Should we also use pybase64 here as introduced in #19984, or do you think it's not worth for this case.
There was a problem hiding this comment.
Let's do that in a separate PR
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
vllm.utils.serial_utils.tensor2base64convenience functiontokenizer=Nonein_preprocess_chattokenizerargument forMultiModalItemTrackerclasses, and by extensionparse_chat_messagesfunctionsTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.