[WIP] Multimodal model support for V1 TPU#12133
[WIP] Multimodal model support for V1 TPU#12133mgoin wants to merge 27 commits intovllm-project:tpu_v1from
Conversation
Signed-off-by: mgoin <mgoin@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
cc @bvrockwell - FYI |
dea6afd to
c6f526c
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
1392a46 to
39c4a4c
Compare
|
cc @yaochengji could you please take a look? |
Based on and requires #11936
Currently only focused on usability and correctness, not performance.
This does not deal with pre-compiling the encoder forward pass, so in the event that the model is passed in image/video/audio that is a new shape, it will force compilation during runtime.
Tested Examples
Image:
Audio: