Added multiple missing model results to RTEB#291
Conversation
fully run models include: "sentence-transformers/static-retrieval-mrl-en-v1", "sentence-transformers/paraphrase-multilingual-mpnet-base-v2", "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", Unfinished ((gpu OOM) "Snowflake/snowflake-arctic-embed-l-v2.0", "jinaai/jina-embeddings-v3", "intfloat/multilingual-e5-large", "intfloat/multilingual-e5-base", "intfloat/multilingual-e5-small", Got errors for: "nvidia/NV-Embed-v2" (embeddings-benchmark/mteb#3287) "Snowflake/snowflake-arctic-embed-m-v2.0" (large version works, we load using sentence trf)
Model Results ComparisonReference models: Results for
|
| task_name | Snowflake/snowflake-arctic-embed-l-v2.0 | google/gemini-embedding-001 | Max result |
|---|---|---|---|
| DS1000Retrieval | 0.4266 | 0.6870 | 0.6897 |
| FinQARetrieval | 0.566 | 0.6464 | 0.8552 |
| FinanceBenchRetrieval | 0.7564 | 0.9157 | 0.9298 |
| HC3FinanceRetrieval | 0.5441 | 0.7758 | 0.8242 |
| HumanEvalRetrieval | 0.7153 | 0.9910 | 0.9945 |
| MBPPRetrieval | 0.8023 | 0.9416 | 0.9416 |
| WikiSQLRetrieval | 0.6971 | 0.8814 | 0.9375 |
| Average | 0.644 | 0.8341 | 0.8818 |
Results for intfloat/multilingual-e5-large-instruct
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | intfloat/multilingual-e5-large-instruct | Max result |
|---|---|---|---|---|
| ChatDoctorRetrieval | 0.7352 | 0.5687 | 0.5522 | 0.7390 |
| Code1Retrieval | 0.9474 | nan | 0.8891 | 0.9474 |
| DS1000Retrieval | 0.6870 | nan | 0.494 | 0.6897 |
| EnglishFinance1Retrieval | 0.7332 | nan | 0.6856 | 0.8188 |
| EnglishFinance2Retrieval | 0.6740 | nan | 0.4868 | 0.8851 |
| EnglishFinance3Retrieval | 0.8330 | nan | 0.6685 | 0.8330 |
| EnglishFinance4Retrieval | 0.5757 | nan | 0.4914 | 0.5997 |
| EnglishHealthcare1Retrieval | 0.6338 | nan | 0.5173 | 0.6603 |
| FinQARetrieval | 0.6464 | nan | 0.4506 | 0.8552 |
| FinanceBenchRetrieval | 0.9157 | nan | 0.7967 | 0.9298 |
| French1Retrieval | 0.8781 | nan | 0.807 | 0.8884 |
| FrenchLegal1Retrieval | 0.8696 | nan | 0.5109 | 0.9332 |
| FreshStackRetrieval | 0.3979 | 0.2519 | 0.2759 | 0.4438 |
| German1Retrieval | 0.9761 | nan | 0.9577 | 0.9771 |
| GermanHealthcare1Retrieval | 0.8742 | nan | 0.7424 | 0.8810 |
| GermanLegal1Retrieval | 0.7149 | nan | 0.641 | 0.7405 |
| HC3FinanceRetrieval | 0.7758 | nan | 0.5122 | 0.8242 |
| HumanEvalRetrieval | 0.9910 | nan | 0.8635 | 0.9945 |
| JapaneseCode1Retrieval | 0.8650 | nan | 0.6985 | 0.8650 |
| JapaneseLegal1Retrieval | 0.9228 | nan | 0.8202 | 0.9228 |
| MBPPRetrieval | 0.9416 | nan | 0.8355 | 0.9416 |
| WikiSQLRetrieval | 0.8814 | nan | 0.8068 | 0.9375 |
| Average | 0.7941 | 0.4103 | 0.6593 | 0.8322 |
Results for intfloat/multilingual-e5-large
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| Revisions | 4dc6d853a804b9c8886ede6dda8a073b7dc08a81 | ab10c1a7f42e74530fe7ae5be82e6d4f11a719eb | ||
| AILACasedocs | 0.4833 | 0.2643 | 0.2643 | 0.4833 |
| ChatDoctorRetrieval | 0.7352 | nan | 0.5687 | 0.7390 |
| FreshStackRetrieval | 0.3979 | nan | 0.2519 | 0.4438 |
| Average | 0.5388 | 0.2643 | 0.3616 | 0.5553 |
Results for sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | Max result |
|---|---|---|---|---|
| ChatDoctorRetrieval | 0.7352 | 0.5687 | 0.3149 | 0.7390 |
| Code1Retrieval | 0.9474 | nan | 0.4164 | 0.9474 |
| DS1000Retrieval | 0.6870 | nan | 0.096 | 0.6897 |
| EnglishFinance1Retrieval | 0.7332 | nan | 0.5657 | 0.8188 |
| EnglishFinance2Retrieval | 0.6740 | nan | 0.2485 | 0.8851 |
| EnglishFinance3Retrieval | 0.8330 | nan | 0.39 | 0.8330 |
| EnglishFinance4Retrieval | 0.5757 | nan | 0.327 | 0.5997 |
| EnglishHealthcare1Retrieval | 0.6338 | nan | 0.3349 | 0.6603 |
| FinQARetrieval | 0.6464 | nan | 0.2298 | 0.8552 |
| FinanceBenchRetrieval | 0.9157 | nan | 0.416 | 0.9298 |
| French1Retrieval | 0.8781 | nan | 0.6672 | 0.8884 |
| FrenchLegal1Retrieval | 0.8696 | nan | 0.0413 | 0.9332 |
| FreshStackRetrieval | 0.3979 | 0.2519 | 0.1112 | 0.4438 |
| German1Retrieval | 0.9761 | nan | 0.8172 | 0.9771 |
| GermanHealthcare1Retrieval | 0.8742 | nan | 0.4117 | 0.8810 |
| GermanLegal1Retrieval | 0.7149 | nan | 0.4272 | 0.7405 |
| HC3FinanceRetrieval | 0.7758 | nan | 0.2489 | 0.8242 |
| HumanEvalRetrieval | 0.9910 | nan | 0.3949 | 0.9945 |
| JapaneseCode1Retrieval | 0.8650 | nan | 0.3627 | 0.8650 |
| JapaneseLegal1Retrieval | 0.9228 | nan | 0.5363 | 0.9228 |
| MBPPRetrieval | 0.9416 | nan | 0.376 | 0.9416 |
| WikiSQLRetrieval | 0.8814 | nan | 0.8158 | 0.9375 |
| Average | 0.7941 | 0.4103 | 0.3886 | 0.8322 |
Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | Max result |
|---|---|---|---|---|
| ChatDoctorRetrieval | 0.7352 | 0.5687 | 0.31 | 0.7390 |
| Code1Retrieval | 0.9474 | nan | 0.4892 | 0.9474 |
| DS1000Retrieval | 0.6870 | nan | 0.1384 | 0.6897 |
| EnglishFinance1Retrieval | 0.7332 | nan | 0.5572 | 0.8188 |
| EnglishFinance2Retrieval | 0.6740 | nan | 0.2592 | 0.8851 |
| EnglishFinance3Retrieval | 0.8330 | nan | 0.4212 | 0.8330 |
| EnglishFinance4Retrieval | 0.5757 | nan | 0.3472 | 0.5997 |
| EnglishHealthcare1Retrieval | 0.6338 | nan | 0.3493 | 0.6603 |
| FinQARetrieval | 0.6464 | nan | 0.251 | 0.8552 |
| FinanceBenchRetrieval | 0.9157 | nan | 0.5435 | 0.9298 |
| French1Retrieval | 0.8781 | nan | 0.697 | 0.8884 |
| FrenchLegal1Retrieval | 0.8696 | nan | 0.0577 | 0.9332 |
| FreshStackRetrieval | 0.3979 | 0.2519 | 0.1108 | 0.4438 |
| German1Retrieval | 0.9761 | nan | 0.865 | 0.9771 |
| GermanHealthcare1Retrieval | 0.8742 | nan | 0.4437 | 0.8810 |
| GermanLegal1Retrieval | 0.7149 | nan | 0.4613 | 0.7405 |
| HC3FinanceRetrieval | 0.7758 | nan | 0.2926 | 0.8242 |
| HumanEvalRetrieval | 0.9910 | nan | 0.447 | 0.9945 |
| JapaneseCode1Retrieval | 0.8650 | nan | 0.4205 | 0.8650 |
| JapaneseLegal1Retrieval | 0.9228 | nan | 0.5649 | 0.9228 |
| MBPPRetrieval | 0.9416 | nan | 0.4049 | 0.9416 |
| WikiSQLRetrieval | 0.8814 | nan | 0.8481 | 0.9375 |
| Average | 0.7941 | 0.4103 | 0.4218 | 0.8322 |
Results for sentence-transformers/static-retrieval-mrl-en-v1
| task_name | google/gemini-embedding-001 | sentence-transformers/static-retrieval-mrl-en-v1 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| Code1Retrieval | 0.9474 | 0.4225 | nan | 0.9474 |
| EnglishFinance1Retrieval | 0.7332 | 0.6136 | nan | 0.8188 |
| EnglishFinance2Retrieval | 0.6740 | 0.4827 | nan | 0.8851 |
| EnglishFinance3Retrieval | 0.8330 | 0.3735 | nan | 0.8330 |
| EnglishFinance4Retrieval | 0.5757 | 0.2642 | nan | 0.5997 |
| EnglishHealthcare1Retrieval | 0.6338 | 0.5122 | nan | 0.6603 |
| French1Retrieval | 0.8781 | 0.5801 | nan | 0.8884 |
| FrenchLegal1Retrieval | 0.8696 | 0.8251 | nan | 0.9332 |
| German1Retrieval | 0.9761 | 0.5041 | nan | 0.9771 |
| GermanHealthcare1Retrieval | 0.8742 | 0.1 | nan | 0.8810 |
| GermanLegal1Retrieval | 0.7149 | 0.4691 | nan | 0.7405 |
| JapaneseCode1Retrieval | 0.8650 | 0.2711 | nan | 0.8650 |
| JapaneseLegal1Retrieval | 0.9228 | 0.1082 | nan | 0.9228 |
| MIRACLRetrievalHardNegatives | 0.7042 | 0.1078 | 0.6675 | 0.7058 |
| Average | 0.8001 | 0.4025 | 0.6675 | 0.8327 |
fully run models include:
"sentence-transformers/static-retrieval-mrl-en-v1",
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
Unfinished (gpu OOM)
"Snowflake/snowflake-arctic-embed-l-v2.0",
"jinaai/jina-embeddings-v3",
"intfloat/multilingual-e5-large",
"intfloat/multilingual-e5-base",
"intfloat/multilingual-e5-small",
Got errors for:
"nvidia/NV-Embed-v2" (embeddings-benchmark/mteb#3287)
"Snowflake/snowflake-arctic-embed-m-v2.0" (large version works, we load using sentence trf - haven't looked much into it)