Skip to content

score bm25 on RTEB#289

Merged
KennethEnevoldsen merged 1 commit intomainfrom
rteb_bm25s
Oct 6, 2025
Merged

score bm25 on RTEB#289
KennethEnevoldsen merged 1 commit intomainfrom
rteb_bm25s

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Oct 4, 2025

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Oct 4, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: bm25s
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for bm25s

task_name bm25s google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AILACasedocs 0.2784 0.4833 0.2643 0.4833
AILAStatutes 0.2162 0.4877 0.2084 0.8509
AppsRetrieval 0.0476 0.9375 0.3255 0.9463
CUREv1 0.1370 0.5957 0.5162 0.6289
ChatDoctorRetrieval 0.3176 0.7352 nan 0.7390
Code1Retrieval 0.4474 0.9474 nan 0.9474
DS1000Retrieval 0.4145 0.6870 nan 0.6897
EnglishFinance1Retrieval 0.7534 0.7332 nan 0.8188
EnglishFinance2Retrieval 0.7647 0.6740 nan 0.8851
EnglishFinance3Retrieval 0.4512 0.8330 nan 0.8330
EnglishFinance4Retrieval 0.3173 0.5757 nan 0.5997
EnglishHealthcare1Retrieval 0.6875 0.6338 nan 0.6603
FinQARetrieval 0.7387 0.6464 nan 0.8552
FinanceBenchRetrieval 0.4668 0.9157 nan 0.9298
French1Retrieval 0.7832 0.8781 nan 0.8884
FrenchLegal1Retrieval 0.9490 0.8696 nan 0.9332
FreshStackRetrieval 0.2789 0.3979 nan 0.4438
German1Retrieval 0.8647 0.9761 nan 0.9771
GermanHealthcare1Retrieval 0.3725 0.8742 nan 0.8810
GermanLegal1Retrieval 0.6688 0.7149 nan 0.7405
HC3FinanceRetrieval 0.2898 0.7758 nan 0.8242
HumanEvalRetrieval 0.3847 0.9910 nan 0.9945
JapaneseCode1Retrieval 0.3386 0.8650 nan 0.8650
JapaneseLegal1Retrieval 0.1113 0.9228 nan 0.9228
LegalQuAD 0.7675 0.6553 0.4317 0.7224
LegalSummarization 0.6098 0.7122 0.621 0.7921
MBPPRetrieval 0.1164 0.9416 nan 0.9416
MIRACLRetrievalHardNegatives 0.2541 0.7042 0.6675 0.7058
WikiSQLRetrieval 0.5216 0.8814 nan 0.9375
Average 0.4603 0.7602 0.4335 0.8082

Model have high performance on these tasks: EnglishHealthcare1Retrieval,FrenchLegal1Retrieval,LegalQuAD


@Samoed
Copy link
Member Author

Samoed commented Oct 4, 2025

Wow, bm25 have the highest performance on 3 tasks

@KennethEnevoldsen KennethEnevoldsen merged commit 944d94b into main Oct 6, 2025
3 checks passed
@KennethEnevoldsen
Copy link
Contributor

loving a strong baseline ;)

@KennethEnevoldsen KennethEnevoldsen deleted the rteb_bm25s branch October 6, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants