Skip to content

Added multiple missing model results to RTEB#291

Merged
Samoed merged 1 commit intomainfrom
more-rteb-results
Oct 7, 2025
Merged

Added multiple missing model results to RTEB#291
Samoed merged 1 commit intomainfrom
more-rteb-results

Conversation

@KennethEnevoldsen
Copy link
Contributor

fully run models include:
"sentence-transformers/static-retrieval-mrl-en-v1",
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",

Unfinished (gpu OOM)
"Snowflake/snowflake-arctic-embed-l-v2.0",
"jinaai/jina-embeddings-v3",
"intfloat/multilingual-e5-large",
"intfloat/multilingual-e5-base",
"intfloat/multilingual-e5-small",

Got errors for:
"nvidia/NV-Embed-v2" (embeddings-benchmark/mteb#3287)
"Snowflake/snowflake-arctic-embed-m-v2.0" (large version works, we load using sentence trf - haven't looked much into it)

fully run models include:
"sentence-transformers/static-retrieval-mrl-en-v1",
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",

Unfinished ((gpu OOM)
"Snowflake/snowflake-arctic-embed-l-v2.0",
"jinaai/jina-embeddings-v3",
"intfloat/multilingual-e5-large",
"intfloat/multilingual-e5-base",
"intfloat/multilingual-e5-small",

Got errors for:
"nvidia/NV-Embed-v2" (embeddings-benchmark/mteb#3287)
"Snowflake/snowflake-arctic-embed-m-v2.0" (large version works, we load using sentence trf)
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Snowflake/snowflake-arctic-embed-l-v2.0, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-retrieval-mrl-en-v1
Tasks: AILACasedocs, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for Snowflake/snowflake-arctic-embed-l-v2.0

task_name Snowflake/snowflake-arctic-embed-l-v2.0 google/gemini-embedding-001 Max result
DS1000Retrieval 0.4266 0.6870 0.6897
FinQARetrieval 0.566 0.6464 0.8552
FinanceBenchRetrieval 0.7564 0.9157 0.9298
HC3FinanceRetrieval 0.5441 0.7758 0.8242
HumanEvalRetrieval 0.7153 0.9910 0.9945
MBPPRetrieval 0.8023 0.9416 0.9416
WikiSQLRetrieval 0.6971 0.8814 0.9375
Average 0.644 0.8341 0.8818

Results for intfloat/multilingual-e5-large-instruct

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large intfloat/multilingual-e5-large-instruct Max result
ChatDoctorRetrieval 0.7352 0.5687 0.5522 0.7390
Code1Retrieval 0.9474 nan 0.8891 0.9474
DS1000Retrieval 0.6870 nan 0.494 0.6897
EnglishFinance1Retrieval 0.7332 nan 0.6856 0.8188
EnglishFinance2Retrieval 0.6740 nan 0.4868 0.8851
EnglishFinance3Retrieval 0.8330 nan 0.6685 0.8330
EnglishFinance4Retrieval 0.5757 nan 0.4914 0.5997
EnglishHealthcare1Retrieval 0.6338 nan 0.5173 0.6603
FinQARetrieval 0.6464 nan 0.4506 0.8552
FinanceBenchRetrieval 0.9157 nan 0.7967 0.9298
French1Retrieval 0.8781 nan 0.807 0.8884
FrenchLegal1Retrieval 0.8696 nan 0.5109 0.9332
FreshStackRetrieval 0.3979 0.2519 0.2759 0.4438
German1Retrieval 0.9761 nan 0.9577 0.9771
GermanHealthcare1Retrieval 0.8742 nan 0.7424 0.8810
GermanLegal1Retrieval 0.7149 nan 0.641 0.7405
HC3FinanceRetrieval 0.7758 nan 0.5122 0.8242
HumanEvalRetrieval 0.9910 nan 0.8635 0.9945
JapaneseCode1Retrieval 0.8650 nan 0.6985 0.8650
JapaneseLegal1Retrieval 0.9228 nan 0.8202 0.9228
MBPPRetrieval 0.9416 nan 0.8355 0.9416
WikiSQLRetrieval 0.8814 nan 0.8068 0.9375
Average 0.7941 0.4103 0.6593 0.8322

Results for intfloat/multilingual-e5-large

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large intfloat/multilingual-e5-large Max result
Revisions 4dc6d853a804b9c8886ede6dda8a073b7dc08a81 ab10c1a7f42e74530fe7ae5be82e6d4f11a719eb
AILACasedocs 0.4833 0.2643 0.2643 0.4833
ChatDoctorRetrieval 0.7352 nan 0.5687 0.7390
FreshStackRetrieval 0.3979 nan 0.2519 0.4438
Average 0.5388 0.2643 0.3616 0.5553

Results for sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Max result
ChatDoctorRetrieval 0.7352 0.5687 0.3149 0.7390
Code1Retrieval 0.9474 nan 0.4164 0.9474
DS1000Retrieval 0.6870 nan 0.096 0.6897
EnglishFinance1Retrieval 0.7332 nan 0.5657 0.8188
EnglishFinance2Retrieval 0.6740 nan 0.2485 0.8851
EnglishFinance3Retrieval 0.8330 nan 0.39 0.8330
EnglishFinance4Retrieval 0.5757 nan 0.327 0.5997
EnglishHealthcare1Retrieval 0.6338 nan 0.3349 0.6603
FinQARetrieval 0.6464 nan 0.2298 0.8552
FinanceBenchRetrieval 0.9157 nan 0.416 0.9298
French1Retrieval 0.8781 nan 0.6672 0.8884
FrenchLegal1Retrieval 0.8696 nan 0.0413 0.9332
FreshStackRetrieval 0.3979 0.2519 0.1112 0.4438
German1Retrieval 0.9761 nan 0.8172 0.9771
GermanHealthcare1Retrieval 0.8742 nan 0.4117 0.8810
GermanLegal1Retrieval 0.7149 nan 0.4272 0.7405
HC3FinanceRetrieval 0.7758 nan 0.2489 0.8242
HumanEvalRetrieval 0.9910 nan 0.3949 0.9945
JapaneseCode1Retrieval 0.8650 nan 0.3627 0.8650
JapaneseLegal1Retrieval 0.9228 nan 0.5363 0.9228
MBPPRetrieval 0.9416 nan 0.376 0.9416
WikiSQLRetrieval 0.8814 nan 0.8158 0.9375
Average 0.7941 0.4103 0.3886 0.8322

Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large sentence-transformers/paraphrase-multilingual-mpnet-base-v2 Max result
ChatDoctorRetrieval 0.7352 0.5687 0.31 0.7390
Code1Retrieval 0.9474 nan 0.4892 0.9474
DS1000Retrieval 0.6870 nan 0.1384 0.6897
EnglishFinance1Retrieval 0.7332 nan 0.5572 0.8188
EnglishFinance2Retrieval 0.6740 nan 0.2592 0.8851
EnglishFinance3Retrieval 0.8330 nan 0.4212 0.8330
EnglishFinance4Retrieval 0.5757 nan 0.3472 0.5997
EnglishHealthcare1Retrieval 0.6338 nan 0.3493 0.6603
FinQARetrieval 0.6464 nan 0.251 0.8552
FinanceBenchRetrieval 0.9157 nan 0.5435 0.9298
French1Retrieval 0.8781 nan 0.697 0.8884
FrenchLegal1Retrieval 0.8696 nan 0.0577 0.9332
FreshStackRetrieval 0.3979 0.2519 0.1108 0.4438
German1Retrieval 0.9761 nan 0.865 0.9771
GermanHealthcare1Retrieval 0.8742 nan 0.4437 0.8810
GermanLegal1Retrieval 0.7149 nan 0.4613 0.7405
HC3FinanceRetrieval 0.7758 nan 0.2926 0.8242
HumanEvalRetrieval 0.9910 nan 0.447 0.9945
JapaneseCode1Retrieval 0.8650 nan 0.4205 0.8650
JapaneseLegal1Retrieval 0.9228 nan 0.5649 0.9228
MBPPRetrieval 0.9416 nan 0.4049 0.9416
WikiSQLRetrieval 0.8814 nan 0.8481 0.9375
Average 0.7941 0.4103 0.4218 0.8322

Results for sentence-transformers/static-retrieval-mrl-en-v1

task_name google/gemini-embedding-001 sentence-transformers/static-retrieval-mrl-en-v1 intfloat/multilingual-e5-large Max result
Code1Retrieval 0.9474 0.4225 nan 0.9474
EnglishFinance1Retrieval 0.7332 0.6136 nan 0.8188
EnglishFinance2Retrieval 0.6740 0.4827 nan 0.8851
EnglishFinance3Retrieval 0.8330 0.3735 nan 0.8330
EnglishFinance4Retrieval 0.5757 0.2642 nan 0.5997
EnglishHealthcare1Retrieval 0.6338 0.5122 nan 0.6603
French1Retrieval 0.8781 0.5801 nan 0.8884
FrenchLegal1Retrieval 0.8696 0.8251 nan 0.9332
German1Retrieval 0.9761 0.5041 nan 0.9771
GermanHealthcare1Retrieval 0.8742 0.1 nan 0.8810
GermanLegal1Retrieval 0.7149 0.4691 nan 0.7405
JapaneseCode1Retrieval 0.8650 0.2711 nan 0.8650
JapaneseLegal1Retrieval 0.9228 0.1082 nan 0.9228
MIRACLRetrievalHardNegatives 0.7042 0.1078 0.6675 0.7058
Average 0.8001 0.4025 0.6675 0.8327

@Samoed Samoed merged commit 50310b3 into main Oct 7, 2025
3 checks passed
@Samoed Samoed deleted the more-rteb-results branch December 24, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments