Skip to content

add opensearch sparse encoding model results#239

Merged
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
zhichao-aws:main
Jul 21, 2025
Merged

add opensearch sparse encoding model results#239
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
zhichao-aws:main

Conversation

@zhichao-aws
Copy link
Contributor

@zhichao-aws zhichao-aws commented Jul 21, 2025

Add opensearch sparse encoding model results. Results on BEIR dataset.

mteb PR: embeddings-benchmark/mteb#2903

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill, opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini, opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill, opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Tasks: ArguAna, CQADupstackAndroidRetrieval, CQADupstackEnglishRetrieval, CQADupstackGamingRetrieval, CQADupstackGisRetrieval, CQADupstackMathematicaRetrieval, CQADupstackPhysicsRetrieval, CQADupstackProgrammersRetrieval, CQADupstackRetrieval, CQADupstackStatsRetrieval, CQADupstackTexRetrieval, CQADupstackUnixRetrieval, CQADupstackWebmastersRetrieval, CQADupstackWordpressRetrieval, ClimateFEVER, DBPedia, FEVER, FiQA2018, HotpotQA, MSMARCO, NFCorpus, NQ, QuoraRetrieval, SCIDOCS, SciFact, TRECCOVID, Touche2020

Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill Max result
ArguAna 0.86 0.54 0.5 0.90
CQADupstackAndroidRetrieval nan 0.49 0.45 0.74
CQADupstackEnglishRetrieval nan 0.46 0.41 0.70
CQADupstackGamingRetrieval 0.71 0.59 0.53 0.79
CQADupstackGisRetrieval nan 0.37 0.35 0.63
CQADupstackMathematicaRetrieval nan 0.28 0.25 0.69
CQADupstackPhysicsRetrieval nan 0.44 0.4 0.74
CQADupstackProgrammersRetrieval nan 0.42 0.36 0.66
CQADupstackRetrieval nan 0.4 0.36 0.68
CQADupstackStatsRetrieval nan 0.32 0.33 0.62
CQADupstackTexRetrieval nan 0.28 0.28 0.63
CQADupstackUnixRetrieval 0.54 0.4 0.36 0.72
CQADupstackWebmastersRetrieval nan 0.4 0.34 0.68
CQADupstackWordpressRetrieval nan 0.32 0.31 0.59
ClimateFEVER nan 0.26 0.22 0.57
DBPedia nan 0.41 0.42 0.53
FEVER nan 0.83 0.82 0.96
FiQA2018 0.62 0.44 0.36 0.80
HotpotQA nan 0.71 0.67 0.88
MSMARCO nan 0.44 0.41 0.48
NFCorpus nan 0.34 0.34 0.56
NQ nan 0.64 0.53 0.82
QuoraRetrieval nan 0.89 0.84 0.92
SCIDOCS 0.25 0.17 0.17 0.35
SciFact nan 0.7 0.71 0.87
TRECCOVID 0.86 0.71 0.69 0.95
Touche2020 nan 0.23 0.29 0.39
Average 0.64 0.46 0.43 0.70

Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini Max result
ArguAna 0.86 0.54 0.48 0.90
CQADupstackAndroidRetrieval nan 0.49 0.43 0.74
CQADupstackEnglishRetrieval nan 0.46 0.39 0.70
CQADupstackGamingRetrieval 0.71 0.59 0.52 0.79
CQADupstackGisRetrieval nan 0.37 0.35 0.63
CQADupstackMathematicaRetrieval nan 0.28 0.25 0.69
CQADupstackPhysicsRetrieval nan 0.44 0.39 0.74
CQADupstackProgrammersRetrieval nan 0.42 0.35 0.66
CQADupstackRetrieval nan 0.4 0.35 0.68
CQADupstackStatsRetrieval nan 0.32 0.32 0.62
CQADupstackTexRetrieval nan 0.28 0.27 0.63
CQADupstackUnixRetrieval 0.54 0.4 0.34 0.72
CQADupstackWebmastersRetrieval nan 0.4 0.34 0.68
CQADupstackWordpressRetrieval nan 0.32 0.3 0.59
ClimateFEVER nan 0.26 0.22 0.57
DBPedia nan 0.41 0.41 0.53
FEVER nan 0.83 0.81 0.96
FiQA2018 0.62 0.44 0.34 0.80
HotpotQA nan 0.71 0.67 0.88
MSMARCO nan 0.44 0.4 0.48
NFCorpus nan 0.34 0.34 0.56
NQ nan 0.64 0.51 0.82
QuoraRetrieval nan 0.89 0.83 0.92
SCIDOCS 0.25 0.17 0.16 0.35
SciFact nan 0.7 0.7 0.87
TRECCOVID 0.86 0.71 0.71 0.95
Touche2020 nan 0.23 0.29 0.39
Average 0.64 0.46 0.42 0.70

Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill Max result
ArguAna 0.86 0.54 0.52 0.90
CQADupstackAndroidRetrieval nan 0.49 0.45 0.74
CQADupstackEnglishRetrieval nan 0.46 0.41 0.70
CQADupstackGamingRetrieval 0.71 0.59 0.54 0.79
CQADupstackGisRetrieval nan 0.37 0.36 0.63
CQADupstackMathematicaRetrieval nan 0.28 0.27 0.69
CQADupstackPhysicsRetrieval nan 0.44 0.4 0.74
CQADupstackProgrammersRetrieval nan 0.42 0.37 0.66
CQADupstackRetrieval nan 0.4 0.37 0.68
CQADupstackStatsRetrieval nan 0.32 0.33 0.62
CQADupstackTexRetrieval nan 0.28 0.28 0.63
CQADupstackUnixRetrieval 0.54 0.4 0.36 0.72
CQADupstackWebmastersRetrieval nan 0.4 0.36 0.68
CQADupstackWordpressRetrieval nan 0.32 0.3 0.59
ClimateFEVER nan 0.26 0.24 0.57
DBPedia nan 0.41 0.42 0.53
FEVER nan 0.83 0.84 0.96
FiQA2018 0.62 0.44 0.36 0.80
HotpotQA nan 0.71 0.69 0.88
MSMARCO nan 0.44 0.42 0.48
NFCorpus nan 0.34 0.34 0.56
NQ nan 0.64 0.54 0.82
QuoraRetrieval nan 0.89 0.86 0.92
SCIDOCS 0.25 0.17 0.16 0.35
SciFact nan 0.7 0.71 0.87
TRECCOVID 0.86 0.71 0.72 0.95
Touche2020 nan 0.23 0.29 0.39
Average 0.64 0.46 0.44 0.70

Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte

task_name google/gemini-embedding-001 intfloat/multilingual-e5-large opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte Max result
ArguAna 0.86 0.54 0.52 0.90
CQADupstackAndroidRetrieval nan 0.49 0.46 0.74
CQADupstackEnglishRetrieval nan 0.46 0.44 0.70
CQADupstackGamingRetrieval 0.71 0.59 0.55 0.79
CQADupstackGisRetrieval nan 0.37 0.36 0.63
CQADupstackMathematicaRetrieval nan 0.28 0.26 0.69
CQADupstackPhysicsRetrieval nan 0.44 0.4 0.74
CQADupstackProgrammersRetrieval nan 0.42 0.38 0.66
CQADupstackRetrieval nan 0.4 0.38 0.68
CQADupstackStatsRetrieval nan 0.32 0.33 0.62
CQADupstackTexRetrieval nan 0.28 0.28 0.63
CQADupstackUnixRetrieval 0.54 0.4 0.36 0.72
CQADupstackWebmastersRetrieval nan 0.4 0.39 0.68
CQADupstackWordpressRetrieval nan 0.32 0.32 0.59
ClimateFEVER nan 0.26 0.31 0.57
DBPedia nan 0.41 0.45 0.53
FEVER nan 0.83 0.86 0.96
FiQA2018 0.62 0.44 0.41 0.80
HotpotQA nan 0.71 0.72 0.88
MSMARCO nan 0.44 0.43 0.48
NFCorpus nan 0.34 0.36 0.56
NQ nan 0.64 0.58 0.82
QuoraRetrieval nan 0.89 0.87 0.92
SCIDOCS 0.25 0.17 0.17 0.35
SciFact nan 0.7 0.73 0.87
TRECCOVID 0.86 0.71 0.73 0.95
Touche2020 nan 0.23 0.39 0.39
Average 0.64 0.46 0.46 0.70

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specifiy revision of model as in embeddings-benchmark/mteb#2919 instead of main

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Using the latest commit id should be fine. I see you have created a PR in mteb to fix it.

So I should rename the dir in this PR to make the revision consistent with the one in mteb?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Samoed Samoed merged commit 5fcbf06 into embeddings-benchmark:main Jul 21, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants