Add Gemma-Embeddings-v0.8 Retrieval Results#59
Add Gemma-Embeddings-v0.8 Retrieval Results#59KennethEnevoldsen merged 17 commits intoembeddings-benchmark:mainfrom
Conversation
|
Thanks for the PR @nicholasmonath it seems like multiple fields such as the MTEB version are not specified as shown by the tests. It also seems like the model_meta is not filled out. |
|
Could you provide the script you used to run MTEB? It seems a bit unusual that the original results didn’t include the MTEB version and evaluation time |
|
Hi @KennethEnevoldsen and @Samoed. Thank you for your comments. We wrote a sanitizer to remove sensitive info like timings, but we realized that our sanitizer was overly sensitive and removed even necessary fields. We are working on updating the pull request. We will add back the MTEB version that we used (we noticed that it is actually an older version 1.0.3) and model_meta. However, we are still required to avoid evaluation time due to the sensitivity of the infrastructure that we use. |
|
Excluding runtime and co2 emissions is fine, however, 1.0.3 is quite an old version. I would strongly recommend running it on the latest version of mteb. The scores should be approximately the same (minor differences as the seed changed in older version of the code along with code changes). We also standardize the result format in later versions of MTEB. If your model is prompt-based, newer versions of the benchmark allow you to integrate that as well. |
|
Hi @KennethEnevoldsen, thank you for comments and time reviewing this PR. We have now updated our MTEB version to 1.21.7. The latest files now have this version and we have only sanitized the evaluation time. Please let us know if you have any questions or concerns. |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Thanks for the update - there is a few issues remaining
results/google__Gemma-Embeddings-v0.8/d6813d20532a97ea8e30fc285397d5105316511f/ArguAna.json
Outdated
Show resolved
Hide resolved
results/google__Gemma-Embeddings-v0.8/d6813d20532a97ea8e30fc285397d5105316511f/model_meta.json
Outdated
Show resolved
Hide resolved
…5397d5105316511f/model_meta.json Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
|
Thank you so much @KennethEnevoldsen! Sorry about those remaining issues. I believe I have resolved them all now. Please do let me know if there is anything else. |
|
Hi @KennethEnevoldsen, thank you again for all of your help with this pull request. I wanted to check in about when the results would appear on the leaderboard? I thought that they might appear after the update today, but I don't see them added? Please let me know if there is anything more from my side that you need. Thanks very much! |
|
For them to appear on the current leaderboard you will have to updatee paths.json (see snippet results.py) If adding a new models also add their names to results.py. (we are close to having a new leaderboard ready where this will no longer be necessary) |
|
Thank you for your quick reply and the information, @KennethEnevoldsen! I have updated paths.json here: #69 Note, that it looked like the MODELS in results.py are automatically pulled from this line: https://github.com/embeddings-benchmark/results/blob/main/results.py#L295 and so I did not modify this file. |
Add Gofer-Embeddings-v0.8 results on the retrieval subset of tasks in MTEB.