diff --git a/README.md b/README.md index 7712990e..76dcf8c7 100644 --- a/README.md +++ b/README.md @@ -110,7 +110,7 @@ Below are some examples of the currently supported models: ### Docker ```shell -model=BAAI/bge-large-en-v1.5 +model=Qwen/Qwen3-Embedding-0.6B volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model @@ -369,13 +369,13 @@ cd models # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install -git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5 +git clone https://huggingface.co/Qwen/Qwen3-Embedding-0.6B # Set the models directory as the volume path volume=$PWD # Mount the models directory inside the container with a volume and set the model ID -docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/gte-base-en-v1.5 +docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/Qwen3-Embedding-0.6B ``` ### Using Re-rankers models @@ -458,7 +458,7 @@ found [here](https://github.com/huggingface/text-embeddings-inference/blob/main/ You can use the gRPC API by adding the `-grpc` tag to any TEI Docker image. For example: ```shell -model=BAAI/bge-large-en-v1.5 +model=Qwen/Qwen3-Embedding-0.6B volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7-grpc --model-id $model @@ -494,7 +494,7 @@ cargo install --path router -F metal You can now launch Text Embeddings Inference on CPU with: ```shell -model=BAAI/bge-large-en-v1.5 +model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --port 8080 ``` @@ -532,7 +532,7 @@ cargo install --path router -F candle-cuda -F http --no-default-features You can now launch Text Embeddings Inference on GPU with: ```shell -model=BAAI/bge-large-en-v1.5 +model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --port 8080 ``` diff --git a/docs/source/en/intel_container.md b/docs/source/en/intel_container.md index f0fae218..e08b786f 100644 --- a/docs/source/en/intel_container.md +++ b/docs/source/en/intel_container.md @@ -35,7 +35,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_cpu_ipe To deploy your model on an Intel® CPU, use the following command: ```shell -model='BAAI/bge-large-en-v1.5' +model='Qwen/Qwen3-Embedding-0.6B' volume=$PWD/data docker run -p 8080:80 -v $volume:/data tei_cpu_ipex --model-id $model @@ -58,7 +58,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_xpu_ipe To deploy your model on an Intel® XPU, use the following command: ```shell -model='BAAI/bge-large-en-v1.5' +model='Qwen/Qwen3-Embedding-0.6B' volume=$PWD/data docker run -p 8080:80 -v $volume:/data --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path tei_xpu_ipex --model-id $model --dtype float16 @@ -81,7 +81,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_hpu To deploy your model on an Intel® HPU (Gaudi), use the following command: ```shell -model='BAAI/bge-large-en-v1.5' +model='Qwen/Qwen3-Embedding-0.6B' volume=$PWD/data docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e MAX_WARMUP_SEQUENCE_LENGTH=512 tei_hpu --model-id $model --dtype bfloat16 diff --git a/docs/source/en/local_cpu.md b/docs/source/en/local_cpu.md index 504d3548..2d5254cf 100644 --- a/docs/source/en/local_cpu.md +++ b/docs/source/en/local_cpu.md @@ -47,10 +47,9 @@ cargo install --path router -F metal Once the installation is successfully complete, you can launch Text Embeddings Inference on CPU with the following command: ```shell -model=BAAI/bge-large-en-v1.5 -revision=refs/pr/5 +model=Qwen/Qwen3-Embedding-0.6B -text-embeddings-router --model-id $model --revision $revision --port 8080 +text-embeddings-router --model-id $model --port 8080 ``` diff --git a/docs/source/en/local_gpu.md b/docs/source/en/local_gpu.md index 7b76300a..7af94df8 100644 --- a/docs/source/en/local_gpu.md +++ b/docs/source/en/local_gpu.md @@ -58,8 +58,7 @@ cargo install --path router -F candle-cuda -F http --no-default-features You can now launch Text Embeddings Inference on GPU with: ```shell -model=BAAI/bge-large-en-v1.5 -revision=refs/pr/5 +model=Qwen/Qwen3-Embedding-0.6B -text-embeddings-router --model-id $model --revision $revision --port 8080 +text-embeddings-router --model-id $model --dtype float16 --port 8080 ``` diff --git a/docs/source/en/local_metal.md b/docs/source/en/local_metal.md index 0fd7f8ac..b9e110b2 100644 --- a/docs/source/en/local_metal.md +++ b/docs/source/en/local_metal.md @@ -38,10 +38,9 @@ cargo install --path router -F metal Once the installation is successfully complete, you can launch Text Embeddings Inference with Metal with the following command: ```shell -model=BAAI/bge-large-en-v1.5 -revision=refs/pr/5 +model=Qwen/Qwen3-Embedding-0.6B -text-embeddings-router --model-id $model --revision $revision --port 8080 +text-embeddings-router --model-id $model --port 8080 ``` Now you are ready to use `text-embeddings-inference` locally on your machine. diff --git a/docs/source/en/quick_tour.md b/docs/source/en/quick_tour.md index 16973b5c..56ee05a1 100644 --- a/docs/source/en/quick_tour.md +++ b/docs/source/en/quick_tour.md @@ -28,10 +28,10 @@ Next, install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/ ## Deploy -Next it's time to deploy your model. Let's say you want to use [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5). Here's how you can do this: +Next it's time to deploy your model. Let's say you want to use [`Qwen/Qwen3-Embedding-0.6B`](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B). Here's how you can do this: ```shell -model=BAAI/bge-large-en-v1.5 +model=Qwen/Qwen3-Embedding-0.6B volume=$PWD/data docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model diff --git a/docs/source/en/supported_models.md b/docs/source/en/supported_models.md index f73efadd..3888c087 100644 --- a/docs/source/en/supported_models.md +++ b/docs/source/en/supported_models.md @@ -21,21 +21,24 @@ We are continually expanding our support for other model types and plan to inclu ## Supported embeddings models Text Embeddings Inference currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT -model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, and ModernBERT. +model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, ModernBERT, and Qwen3. Below are some examples of the currently supported models: | MTEB Rank | Model Size | Model Type | Model ID | |-----------|---------------------|-------------|--------------------------------------------------------------------------------------------------| -| 3 | 7B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) | -| 11 | 1.5B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | -| 14 | 7B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) | -| 20 | 0.3B | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) | -| 31 | 0.5B | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) | -| 37 | 0.3B | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) | -| 49 | 0.5B | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) | +| 2 | 8B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-8B](https://hf.co/Qwen/Qwen3-Embedding-8B) | +| 4 | 0.6B | Qwen3 | [Qwen/Qwen3-Embedding-0.6B](https://hf.co/Qwen/Qwen3-Embedding-0.6B) | +| 6 | 7B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) | +| 7 | 0.5B | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) | +| 14 | 1.5B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | +| 17 | 7B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) | +| 34 | 0.5B | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) | +| 40 | 0.3B | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) | +| 51 | 0.3B | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) | | N/A | 0.4B | Alibaba GTE | [Alibaba-NLP/gte-large-en-v1.5](https://hf.co/Alibaba-NLP/gte-large-en-v1.5) | -| N/A | 0.4B | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) | +| N/A | 0.4B | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) | +| N/A | 0.3B | NomicBert | [nomic-ai/nomic-embed-text-v2-moe](https://hf.co/nomic-ai/nomic-embed-text-v2-moe) | | N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) | | N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) | | N/A | 0.1B | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) | @@ -56,6 +59,7 @@ Below are some examples of the currently supported models: | Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | | Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | | Re-Ranking | GTE | [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) | +| Re-Ranking | ModernBert | [Alibaba-NLP/gte-reranker-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base) | | Sentiment Analysis | RoBERTa | [SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions) | ## Supported hardware