feat: add jina-v3 into model list#1318
feat: add jina-v3 into model list#1318bwanglzu wants to merge 2 commits intoembeddings-benchmark:mainfrom
Conversation
| *args, | ||
| **kwargs: Any, | ||
| ) -> np.ndarray: | ||
| return super().encode( |
There was a problem hiding this comment.
I'm also working on this in #1319 too, and this implementation isn't quite right because MTEB will generate tasks that don't align with Jina tasks.
There was a problem hiding this comment.
okay okay @Samoed didn't aware you're working on it, i can imaging because now i'm still testing the implementation (marked as draft). What should we do now? shall i take over jina models?
There was a problem hiding this comment.
btw the reason i add here is because i want to give a full MMTEB evaluation of jina models
There was a problem hiding this comment.
Yes, I wanted too. I will provide my tests results in an hour
| ), | ||
| name=MODEL_NAME, | ||
| languages=XLMR_LANGUAGES, | ||
| open_source=True, # CC-BY-NC-4.0 |
There was a problem hiding this comment.
in #1316 we update the metadata for all models:
It might be nice to add:
max_tokens=...,
embed_dim=...,
n_parameters=...,
memory_usage=...,
license="cc-by-nc-4.0",
reference=..., # you will need to pull from the PR to add this
similarity_fn_name="cosine",
framework=[...],
use_instuctions=...,
|
@Samoed @KennethEnevoldsen i'll close my PR and focus on review and testing #1319 as it is already in a better state |
Checklist
make test.make lint.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)