score bm25 on RTEB by Samoed · Pull Request #289 · embeddings-benchmark/results

Samoed · 2025-10-04T22:31:37Z

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

github-actions · 2025-10-04T22:37:23Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: bm25s
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for `bm25s`

task_name	bm25s	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
AILACasedocs	0.2784	0.4833	0.2643	0.4833
AILAStatutes	0.2162	0.4877	0.2084	0.8509
AppsRetrieval	0.0476	0.9375	0.3255	0.9463
CUREv1	0.1370	0.5957	0.5162	0.6289
ChatDoctorRetrieval	0.3176	0.7352	nan	0.7390
Code1Retrieval	0.4474	0.9474	nan	0.9474
DS1000Retrieval	0.4145	0.6870	nan	0.6897
EnglishFinance1Retrieval	0.7534	0.7332	nan	0.8188
EnglishFinance2Retrieval	0.7647	0.6740	nan	0.8851
EnglishFinance3Retrieval	0.4512	0.8330	nan	0.8330
EnglishFinance4Retrieval	0.3173	0.5757	nan	0.5997
EnglishHealthcare1Retrieval	0.6875	0.6338	nan	0.6603
FinQARetrieval	0.7387	0.6464	nan	0.8552
FinanceBenchRetrieval	0.4668	0.9157	nan	0.9298
French1Retrieval	0.7832	0.8781	nan	0.8884
FrenchLegal1Retrieval	0.9490	0.8696	nan	0.9332
FreshStackRetrieval	0.2789	0.3979	nan	0.4438
German1Retrieval	0.8647	0.9761	nan	0.9771
GermanHealthcare1Retrieval	0.3725	0.8742	nan	0.8810
GermanLegal1Retrieval	0.6688	0.7149	nan	0.7405
HC3FinanceRetrieval	0.2898	0.7758	nan	0.8242
HumanEvalRetrieval	0.3847	0.9910	nan	0.9945
JapaneseCode1Retrieval	0.3386	0.8650	nan	0.8650
JapaneseLegal1Retrieval	0.1113	0.9228	nan	0.9228
LegalQuAD	0.7675	0.6553	0.4317	0.7224
LegalSummarization	0.6098	0.7122	0.621	0.7921
MBPPRetrieval	0.1164	0.9416	nan	0.9416
MIRACLRetrievalHardNegatives	0.2541	0.7042	0.6675	0.7058
WikiSQLRetrieval	0.5216	0.8814	nan	0.9375
Average	0.4603	0.7602	0.4335	0.8082

Model have high performance on these tasks: EnglishHealthcare1Retrieval,FrenchLegal1Retrieval,LegalQuAD

Samoed · 2025-10-04T22:45:37Z

Wow, bm25 have the highest performance on 3 tasks

KennethEnevoldsen · 2025-10-06T12:00:17Z

loving a strong baseline ;)

score bm25 on RTEB

81ba4d7

Samoed requested a review from KennethEnevoldsen October 4, 2025 22:31

Samoed mentioned this pull request Oct 5, 2025

fix bm25 on small datasets embeddings-benchmark/mteb#3261

Merged

KennethEnevoldsen merged commit 944d94b into main Oct 6, 2025
3 checks passed

KennethEnevoldsen deleted the rteb_bm25s branch October 6, 2025 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

score bm25 on RTEB#289

score bm25 on RTEB#289
KennethEnevoldsen merged 1 commit intomainfrom
rteb_bm25s

Samoed commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

Samoed commented Oct 4, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Samoed commented Oct 4, 2025

Checklist

Uh oh!

github-actions bot commented Oct 4, 2025

Model Results Comparison

Results for bm25s

Uh oh!

Samoed commented Oct 4, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Results for `bm25s`