Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multimodals models with vLLM #3670

Closed
mudler opened this issue Sep 26, 2024 · 4 comments · Fixed by #3729
Closed

Support multimodals models with vLLM #3670

mudler opened this issue Sep 26, 2024 · 4 comments · Fixed by #3729
Labels
enhancement New feature or request roadmap

Comments

@mudler
Copy link
Owner

mudler commented Sep 26, 2024

Is your feature request related to a problem? Please describe.
Many models are now becoming multi-model, that is they can accept images, videos or audio during inference. The llama.cpp project is currently providing multimodal support and we do as well by using it, however there are models which aren't supported yet (for instance #3535 and #3669, see also ggerganov/llama.cpp#9455 )

Describe the solution you'd like
LocalAI to support vLLM multimodal capabilities

Describe alternatives you've considered

Additional context
See #3535 and #3669, tangentially related to: #2318 #3602

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py

@mudler mudler added enhancement New feature or request roadmap labels Sep 26, 2024
@3unnycheung
Copy link

agree

@SuperPat45
Copy link

SuperPat45 commented Sep 26, 2024

I am very interested in the support of vision models in localAI particularly Llama-3.2-11B-Vision and Pixtral-12b

@mudler
Copy link
Owner Author

mudler commented Oct 4, 2024

With #3729 should cover most of the models and add also video understanding. Model configuration files needs to be specify placeholders used by models for image/video tags in the text prompt, going to experiment with this once in master and update the model gallery with few examples.

mudler added a commit that referenced this issue Oct 4, 2024
* feat(vllm): add support for image-to-text

Related to #3670

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add support for video-to-text

Closes: #2318

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): support CPU installations

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add bnb

Signed-off-by: Ettore Di Giacinto <[email protected]>

* chore: add docs reference

Signed-off-by: Ettore Di Giacinto <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <[email protected]>

---------

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
@AlexM4H
Copy link

AlexM4H commented Oct 5, 2024

Great news. However, I miss two Docker images
master-cublas-cuda12-ffmpeg and master-aio-gpu-nvidia-cuda-12

siddimore pushed a commit to siddimore/LocalAI that referenced this issue Oct 6, 2024
)

* feat(vllm): add support for image-to-text

Related to mudler#3670

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add support for video-to-text

Closes: mudler#2318

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): support CPU installations

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add bnb

Signed-off-by: Ettore Di Giacinto <[email protected]>

* chore: add docs reference

Signed-off-by: Ettore Di Giacinto <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <[email protected]>

---------

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants