Skip to content

Comments

feat: add jina-v3 into model list#1318

Closed
bwanglzu wants to merge 2 commits intoembeddings-benchmark:mainfrom
bwanglzu:feat-add-jina-v3
Closed

feat: add jina-v3 into model list#1318
bwanglzu wants to merge 2 commits intoembeddings-benchmark:mainfrom
bwanglzu:feat-add-jina-v3

Conversation

@bwanglzu
Copy link

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

*args,
**kwargs: Any,
) -> np.ndarray:
return super().encode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also working on this in #1319 too, and this implementation isn't quite right because MTEB will generate tasks that don't align with Jina tasks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay okay @Samoed didn't aware you're working on it, i can imaging because now i'm still testing the implementation (marked as draft). What should we do now? shall i take over jina models?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw the reason i add here is because i want to give a full MMTEB evaluation of jina models

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted too. I will provide my tests results in an hour

),
name=MODEL_NAME,
languages=XLMR_LANGUAGES,
open_source=True, # CC-BY-NC-4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #1316 we update the metadata for all models:

It might be nice to add:

    max_tokens=...,
    embed_dim=...,
    n_parameters=...,
    memory_usage=...,
    license="cc-by-nc-4.0",
    reference=..., # you will need to pull from the PR to add this
    similarity_fn_name="cosine",
    framework=[...],
    use_instuctions=...,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear

@bwanglzu
Copy link
Author

@Samoed @KennethEnevoldsen i'll close my PR and focus on review and testing #1319 as it is already in a better state

@bwanglzu bwanglzu closed this Oct 24, 2024
@bwanglzu bwanglzu deleted the feat-add-jina-v3 branch October 24, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants