Support for VLM

### Feature request

VL embeddings extends text embeddings by adding embeddings for vision documents (at the moment images, but possibly videos in the future) and possibly other documents as well (audio).

### Motivation

I understand from the name of this project that this feature might be out-of-scope ("Text Embeddings Inference"), but TGI ("Text Generation Inference) supports for example Qwen2.5-VL, which is a VLM (https://huggingface.co/docs/text-generation-inference/supported_models).

Embeddings and re-ranking VL models, based on Qwen2-VL are already available:
- https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1
- https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

Adding support for VL-embeddings in TEI would be excellent for serverless inference.

### Your contribution

I could add support for Qwen2.5 as a model but candle does not yet support Conv3d. There is already a PR for Qwen2.5-VL in Candle https://github.com/huggingface/candle/pull/2995

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for VLM #669

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for VLM #669

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions