[v2] fix contriever (add similarity_fn_name to ST wrapper)#1749
[v2] fix contriever (add similarity_fn_name to ST wrapper)#1749
Conversation
|
@isaac-chung Model loading gives |
|
Just as an idea: I believe the |
|
Yes, I wanted to integrate |
| class SentenceTransformerWrapperDotSimilarity(SentenceTransformerWrapper): | ||
| def similarity(self, embedding1: np.ndarray, embedding2: np.ndarray) -> float: | ||
| return dot_distance(embedding1, embedding2) |
There was a problem hiding this comment.
Isn't it possible to have the model self define their similarity function? Any reason for us to create a custom wrapper?
There was a problem hiding this comment.
(if this is a one-of I would just move it next to the special case)
There was a problem hiding this comment.
Have we tried defining a Model Meta for the contriever model with the dot similarity function? I thought that was the same way we implemented Colbert with @sam-hey but with max-sim.
There was a problem hiding this comment.
For colbert models created different wrapper with max-sim
There was a problem hiding this comment.
> it's a bit surprising that ModelMeta.similarity_fn_name isn't being utilized.
We would love to switch to that one and would encourage a PR for this. It was only recently added to haven't been integrated throughout.
Originally posted by @KennethEnevoldsen in #1592 (comment)
I believe ModelMeta was never fully utilized. Instead of reverting to the previous version—which I agree was a better approach—we might be able to resolve these issues by properly implementing ModelMeta now.
I'd like to understand why, and perhaps convince us to move/revert back to what we had before, as to me, it worked better than needing to define separate wrappers per scoring function, which isn't scalable.
There was a problem hiding this comment.
I've updated the approach so that the similarity function can now be directly passed to the wrapper.
Previously, users had to specify which similarity function to use, which was a hidden feature and independent of the models.
evaluation.run(
model,
score_function="max_sim",
)There was a problem hiding this comment.
I’ve updated it again, and now it uses a string similarity_fn_name like it stated in ModelMeta. I suggest leaving it as is for now and integrate ModelMeta parameters in a separate PR, as we’re currently duplicating model names and revisions for every model.
There was a problem hiding this comment.
Thanks all. I felt that we could have improved docs for the previous 'hidden' feature, rather than removing it.
I agree that we can improve the ModelMeta param integration in a separate PR.
There was a problem hiding this comment.
The previous implementation required users to explicitly pass the scoring function, which could lead to issues with results that were hard to reproduce or match. This new approach reduces user input.
isaac-chung
left a comment
There was a problem hiding this comment.
Thanks for iterating! Feel free to skip the model loading test while I work on a fix.
Perhaps we could repurpose #1759 a bit now for the model meta integration.

Closes #1731
Checklist
make test.make lint.