[v2] fix contriever (add similarity_fn_name to ST wrapper) by Samoed · Pull Request #1749 · embeddings-benchmark/mteb

Samoed · 2025-01-10T15:29:45Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Samoed · 2025-01-10T15:41:34Z

@isaac-chung Model loading gives

System.IO.IOException: No space left on device : '/home/runner/runners/2.321.0/_diag/Worker_20250110-152956-utc.log' ...

sam-hey · 2025-01-10T16:09:19Z

Just as an idea: I believe the ModelMeta object already has this information similarity_fn_name="cosine". Why not add an Enum type to specify the evaluation function? This would make it much more straightforward to add new models. There's no need to implement a new wrapper class—just one Enum pointing to the fn

class EvaluationFunction(Enum):
    DOT_PRODUCT = dot_product
    MAX_SIM = max_similarity
    COSINE = cosine_similarity

Samoed · 2025-01-10T16:56:22Z

Yes, I wanted to integrate ModelMeta with the wrapper somehow, but for now, they’re independent. I’m not sure if it’s a good idea to pass arguments from ModelMeta to the loader. Thank you for ideas!

sam-hey · 2025-01-10T19:59:11Z

Please take a look at the screenshot. The MTEB ModelMeta is already available.

https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/evaluation/evaluators/model_classes.py#L324:L326

KennethEnevoldsen · 2025-01-11T16:00:26Z

mteb/models/sentence_transformer_wrapper.py

+class SentenceTransformerWrapperDotSimilarity(SentenceTransformerWrapper):
+    def similarity(self, embedding1: np.ndarray, embedding2: np.ndarray) -> float:
+        return dot_distance(embedding1, embedding2)


Isn't it possible to have the model self define their similarity function? Any reason for us to create a custom wrapper?

(if this is a one-of I would just move it next to the special case)

Model that trained with sentence transformers have ability to provide it, but this model doesn't have config for this as was discussed in #1731. I think there can be more models with different similarity functions. Maybe we can use approach from #1759

Have we tried defining a Model Meta for the contriever model with the dot similarity function? I thought that was the same way we implemented Colbert with @sam-hey but with max-sim.

For colbert models created different wrapper with max-sim

> it's a bit surprising that ModelMeta.similarity_fn_name isn't being utilized.

We would love to switch to that one and would encourage a PR for this. It was only recently added to haven't been integrated throughout.

Originally posted by @KennethEnevoldsen in #1592 (comment)

I believe ModelMeta was never fully utilized. Instead of reverting to the previous version—which I agree was a better approach—we might be able to resolve these issues by properly implementing ModelMeta now.

@isaac-chung's comment :

I'd like to understand why, and perhaps convince us to move/revert back to what we had before, as to me, it worked better than needing to define separate wrappers per scoring function, which isn't scalable.

I've updated the approach so that the similarity function can now be directly passed to the wrapper.

Previously, users had to specify which similarity function to use, which was a hidden feature and independent of the models.

evaluation.run( model, score_function="max_sim", )

I’ve updated it again, and now it uses a string similarity_fn_name like it stated in ModelMeta. I suggest leaving it as is for now and integrate ModelMeta parameters in a separate PR, as we’re currently duplicating model names and revisions for every model.

Thanks all. I felt that we could have improved docs for the previous 'hidden' feature, rather than removing it.

I agree that we can improve the ModelMeta param integration in a separate PR.

The previous implementation required users to explicitly pass the scoring function, which could lead to issues with results that were hard to reproduce or match. This new approach reduces user input.

isaac-chung

Thanks for iterating! Feel free to skip the model loading test while I work on a fix.

Perhaps we could repurpose #1759 a bit now for the model meta integration.

Samoed added 2 commits January 10, 2025 18:28

add dotwrapper

d71718b

lint

d50fd88

Samoed changed the title ~~add dotwrapper~~ fix contriever (add dotwrapper) Jan 10, 2025

Samoed linked an issue Jan 10, 2025 that may be closed by this pull request

Non Cosine-Sim Similarity functions for Sentence Transformer models are broken in v2.0.0 #1731

Closed

make cleaner

7d1e949

Samoed mentioned this pull request Jan 10, 2025

Feat: add support for scoring function #1594

Merged

2 tasks

KennethEnevoldsen approved these changes Jan 11, 2025

View reviewed changes

Samoed added 4 commits January 11, 2025 22:57

add similarity_fn

f99786d

update to similarity_fn_name

d13fe99

lint

3cf2168

fix name parameter

fabe9fb

Samoed changed the title ~~fix contriever (add dotwrapper)~~ [v2] fix contriever (add dotwrapper) Jan 11, 2025

isaac-chung changed the title ~~[v2] fix contriever (add dotwrapper)~~ [v2] fix contriever (add similarity_fn_name to ST wrapper) Jan 11, 2025

isaac-chung approved these changes Jan 11, 2025

View reviewed changes

Samoed merged commit 2b41cb4 into v2.0.0 Jan 11, 2025
10 of 11 checks passed

Samoed deleted the fix_contriever branch January 11, 2025 22:36

sam-hey mentioned this pull request Jan 12, 2025

Non Cosine-Sim Similarity functions for Sentence Transformer models are broken in v2.0.0 #1731

Closed

Samoed added the v2 label Sep 6, 2025

Comments

Conversation

Samoed commented Jan 10, 2025

Checklist

Uh oh!

Samoed commented Jan 10, 2025

Uh oh!

sam-hey commented Jan 10, 2025

Uh oh!

Samoed commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sam-hey commented Jan 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sam-hey Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Samoed commented Jan 10, 2025 •

edited

Loading

sam-hey Jan 11, 2025 •

edited

Loading