feat: first batch of results for the MTEB(Medical) benchmark#55
Conversation
|
I haven't updated |
|
@Samoed I see that you are reviewing the related PR (embeddings-benchmark/mteb#1436), will you have the time to take this PR as well? |
Samoed
left a comment
There was a problem hiding this comment.
The results look good. If you want your model to appear on the leaderboard, you'll need to generate a paths.json. However, since a new version of the leaderboard is currently being developed, you might want to wait until it's finished
I’m back this week! Since automatically generating the |
|
I will merge this in for now and then we can resolve the inconsistencies in a separate issue |
As a follow up to embeddings-benchmark/mteb#1459, this PR contains the results for a list of 15 open source models in the new MTEB(Medical) benchmark.
The models included are:
name: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
revision: "bf3bf13ab40c3157080a7ab344c831b9ad18b5eb"
name: "BAAI/bge-small-en-v1.5"
revision: "5c38ec7c405ec4b44b94cc5a9bb96e735b38267a"
name: "BAAI/bge-base-en-v1.5"
revision: "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"
name: "BAAI/bge-large-en-v1.5"
revision: "d4aa6901d3a41ba39fb536a557fa166f842b0e09"
name: "intfloat/multilingual-e5-small"
revision: "fd1525a9fd15316a2d503bf26ab031a61d056e98"
name: "intfloat/multilingual-e5-base"
revision: "d13f1b27baf31030b7fd040960d60d909913633f"
name: "intfloat/multilingual-e5-large"
revision: "ab10c1a7f42e74530fe7ae5be82e6d4f11a719eb"
name: "Alibaba-NLP/gte-multilingual-base"
revision: "7fc06782350c1a83f88b15dd4b38ef853d3b8503"
name: "jinaai/jina-embeddings-v3"
revision: "215a6e121fa0183376388ac6b1ae230326bfeaed"
name: "Snowflake/snowflake-arctic-embed-m-v1.5"
revision: "97eab2e17fcb7ccb8bb94d6e547898fa1a6a0f47"
name: "mixedbread-ai/mxbai-embed-large-v1"
revision: "990580e27d329c7408b3741ecff85876e128e203"
name: "abhinand/MedEmbed-small-v0.1"
revision: "40a5850d046cfdb56154e332b4d7099b63e8d50e"
name: "abhinand/MedEmbed-base-v0.1"
revision: "7a90c50263f620dff743eb9794b89a42bfc5d765"
name: "abhinand/MedEmbed-large-v0.1"
revision: "e621837c7904456dc37d689f97e654424de62318"
name: "nvidia/NV-Embed-v2". # Using the code in this PR
revision: "7604d305b621f14095a1aa23d351674c2859553a"
We also plan to add the following models once the inconsistencies are solved since we also noticed strange results for them:
name: "Alibaba-NLP/gte-Qwen2-1.5B-instruct"
revision: "3276994ba02b26841920728d1adcf115473c88e9"
name: "Alibaba-NLP/gte-Qwen2-7B-instruct"
revision: "e26182b2122f4435e8b3ebecbf363990f409b45b"
Finally, we added a
bm25sbaseline for the retrieval tasks, although there is an issue with clustering and reranking tasks at the moment.My colleague @olivierr42 will take it from here since I will not be available next week.
Feel free to suggest other interesting models and we'll happily run them too 💪