Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: openai_embedding_client returns len 8192 embedding not 4096 #6744

Closed
ehuaa opened this issue Jul 24, 2024 · 3 comments · Fixed by #6755
Closed

[Bug]: openai_embedding_client returns len 8192 embedding not 4096 #6744

ehuaa opened this issue Jul 24, 2024 · 3 comments · Fixed by #6755
Labels
bug Something isn't working

Comments

@ehuaa
Copy link

ehuaa commented Jul 24, 2024

Your current environment

Collecting environment information...
PyTorch version: 2.3.1+cu121

GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
GPU 2: NVIDIA A40
GPU 3: NVIDIA A40

Nvidia driver version: 535.161.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] onnx==1.14.1
[pip3] onnxruntime==1.18.1
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1

vLLM Version: 0.5.3

🐛 Describe the bug

My vllm version is the latest version, v0.5.3 post1
first i launch a embedding server as below
python3 -m vllm.entrypoints.openai.api_server --model Salesforce/SFR-Embedding-Mistral --dtype bfloat16 --enforce-eager --max-model-len 8192
Salesforce/SFR-Embedding-Mistral is an embedding model which has the same architecture with intfloat/e5-mistral

then i use https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py to test online embedding result.
And returns a tensor of 8192 length which is not 4096 as MistralModel's hidden size.
I also make two other test:
a. run tests/entrypoints/openai/test_embedding.py and found that there is no problem with the three tests, which the embedding size is exactly 4096.
b. run examples/offline_inference_embedding.py and the embedding size is also exactly 4096.

Can you have a look at what's going wrong with openai_embedding_client.py, thanks

@ehuaa ehuaa added the bug Something isn't working label Jul 24, 2024
@CatherineSue
Copy link
Contributor

CatherineSue commented Jul 24, 2024

Just checked OpenAI's python lib, they defaultly encode the float data to "base64" if encoding_format is not given, see here, so in openai_embedding_client.py, the encoding of the embedding returned became "base64" instead of "float", hence 8192 dimensions, if we add encoding_format=float, the returned dimensions will be 4096. Will add a fix soon.

@hibukipanim
Copy link

setting encoding_format=float indeed resolve the issue, however maybe there is still a bug with base64 in the vllm server ? as it's the default encoding_format used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0

@HollowMan6
Copy link
Contributor

setting encoding_format=float indeed resolve the issue, however maybe there is still a bug with base64 in the vllm server ? as it's the default encoding_format used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0

@hibukipanim This should hopefully fixed by #7855

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants