Add jina, uae, stella models#1319
Add jina, uae, stella models#1319KennethEnevoldsen merged 19 commits intoembeddings-benchmark:mainfrom
Conversation
Co-authored-by: Wang Bo <bo.wang@jina.ai>
Co-authored-by: Wang Bo <bo.wang@jina.ai>
|
the rest looks good to me, need to run some check to make sure different task adapters (especially retrieval), |
|
I have results. I will paste them soon (currently creating table to compare easily and for jina they are same). @bwanglzu Thank you very much! |
|
Results Summary:
Classification
Clustering
PairClassification
Reranking
Retrieval
STS
Summarization
|
|
perfect thanks @Samoed ! seems our reported on Emotion is lower than what we actually have (lol). do you mind to share me your script so that i can run a few more experiments? BTW some of our reported score might be comes from a smaller context length such as 512, i do not recall in which dataset we evaluate on 512 context length but i believe most of the MTEB tasks except LongEMbed |
What do you mean? I think that mteb using SentenceTransformer uses all context length |
i mean when we submit scores, their might be a small chance it is being submitted by different ppl in the team which utilise slightly different max sequence length, sometimes to speed up evaluation we use 512, sometimes we use full context length which is 8192. |
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
# Conflicts: # mteb/models/overview.py
# Conflicts: # mteb/models/overview.py
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Only a minor thing otherwise all good
|
@KennethEnevoldsen Is this PR ready for merge? |
|
i tested a few more benchmarks and the results are consistent, thanks @Samoed ! |
Checklist
make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)