You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My vllm version is the latest version, v0.5.3 post1
first i launch a embedding server as below python3 -m vllm.entrypoints.openai.api_server --model Salesforce/SFR-Embedding-Mistral --dtype bfloat16 --enforce-eager --max-model-len 8192
Salesforce/SFR-Embedding-Mistral is an embedding model which has the same architecture with intfloat/e5-mistral
then i use https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py to test online embedding result.
And returns a tensor of 8192 length which is not 4096 as MistralModel's hidden size.
I also make two other test:
a. run tests/entrypoints/openai/test_embedding.py and found that there is no problem with the three tests, which the embedding size is exactly 4096.
b. run examples/offline_inference_embedding.py and the embedding size is also exactly 4096.
Can you have a look at what's going wrong with openai_embedding_client.py, thanks
The text was updated successfully, but these errors were encountered:
Just checked OpenAI's python lib, they defaultly encode the float data to "base64" if encoding_format is not given, see here, so in openai_embedding_client.py, the encoding of the embedding returned became "base64" instead of "float", hence 8192 dimensions, if we add encoding_format=float, the returned dimensions will be 4096. Will add a fix soon.
setting encoding_format=float indeed resolve the issue, however maybe there is still a bug with base64 in the vllm server ? as it's the default encoding_format used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0
setting encoding_format=float indeed resolve the issue, however maybe there is still a bug with base64 in the vllm server ? as it's the default encoding_format used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0
Your current environment
Collecting environment information...
PyTorch version: 2.3.1+cu121
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
GPU 2: NVIDIA A40
GPU 3: NVIDIA A40
Nvidia driver version: 535.161.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] onnx==1.14.1
[pip3] onnxruntime==1.18.1
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1
vLLM Version: 0.5.3
🐛 Describe the bug
My vllm version is the latest version, v0.5.3 post1
first i launch a embedding server as below
python3 -m vllm.entrypoints.openai.api_server --model Salesforce/SFR-Embedding-Mistral --dtype bfloat16 --enforce-eager --max-model-len 8192
Salesforce/SFR-Embedding-Mistral is an embedding model which has the same architecture with intfloat/e5-mistral
then i use https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py to test online embedding result.
And returns a tensor of 8192 length which is not 4096 as MistralModel's hidden size.
I also make two other test:
a. run tests/entrypoints/openai/test_embedding.py and found that there is no problem with the three tests, which the embedding size is exactly 4096.
b. run examples/offline_inference_embedding.py and the embedding size is also exactly 4096.
Can you have a look at what's going wrong with openai_embedding_client.py, thanks
The text was updated successfully, but these errors were encountered: