fix for empty string input to openai/text-embedding-3-large#2634
fix for empty string input to openai/text-embedding-3-large#2634ayush1298 wants to merge 4 commits intoembeddings-benchmark:mainfrom
Conversation
|
@KennethEnevoldsen I have added fix that should based on what you explained. Also, to test it as we are calling |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
I addded some suggestions. Do give it another try and try to give me a debug trace if it doesn't work.
|
|
||
| return np.array(all_embeddings) | ||
| all_embeddings = np.array(all_embeddings) | ||
| final_embeddings = np.zeros((len(sentences), self._embed_dim), dtype=np.float32) |
There was a problem hiding this comment.
this already fill out the empty texts.
You can then simply fill in the text_embeddings using:
mask = [i for i, t in enumerate(text) if t.strip()]
final_emb[mask, :] = text_embHere is a small sample to show the idea:
import numpy as np
matrix1 = np.zeros((5, 4))
matrix2 = np.random.rand(3, 4)
matrix1[[0, 2, 4],:] = matrix2There was a problem hiding this comment.
Didn't get these one. Current implementation was 1st creating whole vector of zeros, and then only setting text embeddings, at positions where non-empty text is present. I think you are suggesting same only.
@KennethEnevoldsen Getting the following output when running the above testing code: So, code is running but still not able to handle empty strings. I have tried to put some logging statements at start of |
|
Thanks for taking the time on this @ayush1298. I built upon you earlier pr. and made a fix: #2676 |
Thanks @KennethEnevoldsen |
fix for #1650.
I dont have OpenAI API key to test whether its working correctly or not. @KennethEnevoldsen Can you please test it, when you get time.
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.