feat: add jina-v3 into model list by bwanglzu · Pull Request #1318 · embeddings-benchmark/mteb

bwanglzu · 2024-10-24T11:49:27Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Samoed · 2024-10-24T11:59:22Z

mteb/models/jina_models.py

+            *args,
+            **kwargs: Any,
+        ) -> np.ndarray:
+            return super().encode(


I'm also working on this in #1319 too, and this implementation isn't quite right because MTEB will generate tasks that don't align with Jina tasks.

okay okay @Samoed didn't aware you're working on it, i can imaging because now i'm still testing the implementation (marked as draft). What should we do now? shall i take over jina models?

btw the reason i add here is because i want to give a full MMTEB evaluation of jina models

Yes, I wanted too. I will provide my tests results in an hour

KennethEnevoldsen · 2024-10-24T12:01:07Z

mteb/models/jina_models.py

+    ),
+    name=MODEL_NAME,
+    languages=XLMR_LANGUAGES,
+    open_source=True,  # CC-BY-NC-4.0


in #1316 we update the metadata for all models:

It might be nice to add:

max_tokens=..., embed_dim=..., n_parameters=..., memory_usage=..., license="cc-by-nc-4.0", reference=..., # you will need to pull from the PR to add this similarity_fn_name="cosine", framework=[...], use_instuctions=...,

bwanglzu · 2024-10-24T12:13:08Z

@Samoed @KennethEnevoldsen i'll close my PR and focus on review and testing #1319 as it is already in a better state

bwanglzu added 2 commits October 24, 2024 13:48

feat: add jina-v3 into model list

b60c714

feat: add jina model to overview

9302603

Samoed reviewed Oct 24, 2024

View reviewed changes

KennethEnevoldsen reviewed Oct 24, 2024

View reviewed changes

bwanglzu closed this Oct 24, 2024

bwanglzu deleted the feat-add-jina-v3 branch October 24, 2024 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add jina-v3 into model list#1318

feat: add jina-v3 into model list#1318
bwanglzu wants to merge 2 commits intoembeddings-benchmark:mainfrom
bwanglzu:feat-add-jina-v3

bwanglzu commented Oct 24, 2024

Uh oh!

Samoed Oct 24, 2024

Uh oh!

bwanglzu Oct 24, 2024

Uh oh!

bwanglzu Oct 24, 2024

Uh oh!

Samoed Oct 24, 2024

Uh oh!

KennethEnevoldsen Oct 24, 2024

Uh oh!

bwanglzu Oct 24, 2024

Uh oh!

bwanglzu commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

bwanglzu commented Oct 24, 2024

Checklist

Adding datasets checklist

Adding a model checklist

Uh oh!

Samoed Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

bwanglzu Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

bwanglzu Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

Samoed Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

bwanglzu Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

bwanglzu commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants