add opensearch sparse encoding model results#239
add opensearch sparse encoding model results#239Samoed merged 3 commits intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill | Max result |
|---|---|---|---|---|
| ArguAna | 0.86 | 0.54 | 0.5 | 0.90 |
| CQADupstackAndroidRetrieval | nan | 0.49 | 0.45 | 0.74 |
| CQADupstackEnglishRetrieval | nan | 0.46 | 0.41 | 0.70 |
| CQADupstackGamingRetrieval | 0.71 | 0.59 | 0.53 | 0.79 |
| CQADupstackGisRetrieval | nan | 0.37 | 0.35 | 0.63 |
| CQADupstackMathematicaRetrieval | nan | 0.28 | 0.25 | 0.69 |
| CQADupstackPhysicsRetrieval | nan | 0.44 | 0.4 | 0.74 |
| CQADupstackProgrammersRetrieval | nan | 0.42 | 0.36 | 0.66 |
| CQADupstackRetrieval | nan | 0.4 | 0.36 | 0.68 |
| CQADupstackStatsRetrieval | nan | 0.32 | 0.33 | 0.62 |
| CQADupstackTexRetrieval | nan | 0.28 | 0.28 | 0.63 |
| CQADupstackUnixRetrieval | 0.54 | 0.4 | 0.36 | 0.72 |
| CQADupstackWebmastersRetrieval | nan | 0.4 | 0.34 | 0.68 |
| CQADupstackWordpressRetrieval | nan | 0.32 | 0.31 | 0.59 |
| ClimateFEVER | nan | 0.26 | 0.22 | 0.57 |
| DBPedia | nan | 0.41 | 0.42 | 0.53 |
| FEVER | nan | 0.83 | 0.82 | 0.96 |
| FiQA2018 | 0.62 | 0.44 | 0.36 | 0.80 |
| HotpotQA | nan | 0.71 | 0.67 | 0.88 |
| MSMARCO | nan | 0.44 | 0.41 | 0.48 |
| NFCorpus | nan | 0.34 | 0.34 | 0.56 |
| NQ | nan | 0.64 | 0.53 | 0.82 |
| QuoraRetrieval | nan | 0.89 | 0.84 | 0.92 |
| SCIDOCS | 0.25 | 0.17 | 0.17 | 0.35 |
| SciFact | nan | 0.7 | 0.71 | 0.87 |
| TRECCOVID | 0.86 | 0.71 | 0.69 | 0.95 |
| Touche2020 | nan | 0.23 | 0.29 | 0.39 |
| Average | 0.64 | 0.46 | 0.43 | 0.70 |
Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini | Max result |
|---|---|---|---|---|
| ArguAna | 0.86 | 0.54 | 0.48 | 0.90 |
| CQADupstackAndroidRetrieval | nan | 0.49 | 0.43 | 0.74 |
| CQADupstackEnglishRetrieval | nan | 0.46 | 0.39 | 0.70 |
| CQADupstackGamingRetrieval | 0.71 | 0.59 | 0.52 | 0.79 |
| CQADupstackGisRetrieval | nan | 0.37 | 0.35 | 0.63 |
| CQADupstackMathematicaRetrieval | nan | 0.28 | 0.25 | 0.69 |
| CQADupstackPhysicsRetrieval | nan | 0.44 | 0.39 | 0.74 |
| CQADupstackProgrammersRetrieval | nan | 0.42 | 0.35 | 0.66 |
| CQADupstackRetrieval | nan | 0.4 | 0.35 | 0.68 |
| CQADupstackStatsRetrieval | nan | 0.32 | 0.32 | 0.62 |
| CQADupstackTexRetrieval | nan | 0.28 | 0.27 | 0.63 |
| CQADupstackUnixRetrieval | 0.54 | 0.4 | 0.34 | 0.72 |
| CQADupstackWebmastersRetrieval | nan | 0.4 | 0.34 | 0.68 |
| CQADupstackWordpressRetrieval | nan | 0.32 | 0.3 | 0.59 |
| ClimateFEVER | nan | 0.26 | 0.22 | 0.57 |
| DBPedia | nan | 0.41 | 0.41 | 0.53 |
| FEVER | nan | 0.83 | 0.81 | 0.96 |
| FiQA2018 | 0.62 | 0.44 | 0.34 | 0.80 |
| HotpotQA | nan | 0.71 | 0.67 | 0.88 |
| MSMARCO | nan | 0.44 | 0.4 | 0.48 |
| NFCorpus | nan | 0.34 | 0.34 | 0.56 |
| NQ | nan | 0.64 | 0.51 | 0.82 |
| QuoraRetrieval | nan | 0.89 | 0.83 | 0.92 |
| SCIDOCS | 0.25 | 0.17 | 0.16 | 0.35 |
| SciFact | nan | 0.7 | 0.7 | 0.87 |
| TRECCOVID | 0.86 | 0.71 | 0.71 | 0.95 |
| Touche2020 | nan | 0.23 | 0.29 | 0.39 |
| Average | 0.64 | 0.46 | 0.42 | 0.70 |
Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill | Max result |
|---|---|---|---|---|
| ArguAna | 0.86 | 0.54 | 0.52 | 0.90 |
| CQADupstackAndroidRetrieval | nan | 0.49 | 0.45 | 0.74 |
| CQADupstackEnglishRetrieval | nan | 0.46 | 0.41 | 0.70 |
| CQADupstackGamingRetrieval | 0.71 | 0.59 | 0.54 | 0.79 |
| CQADupstackGisRetrieval | nan | 0.37 | 0.36 | 0.63 |
| CQADupstackMathematicaRetrieval | nan | 0.28 | 0.27 | 0.69 |
| CQADupstackPhysicsRetrieval | nan | 0.44 | 0.4 | 0.74 |
| CQADupstackProgrammersRetrieval | nan | 0.42 | 0.37 | 0.66 |
| CQADupstackRetrieval | nan | 0.4 | 0.37 | 0.68 |
| CQADupstackStatsRetrieval | nan | 0.32 | 0.33 | 0.62 |
| CQADupstackTexRetrieval | nan | 0.28 | 0.28 | 0.63 |
| CQADupstackUnixRetrieval | 0.54 | 0.4 | 0.36 | 0.72 |
| CQADupstackWebmastersRetrieval | nan | 0.4 | 0.36 | 0.68 |
| CQADupstackWordpressRetrieval | nan | 0.32 | 0.3 | 0.59 |
| ClimateFEVER | nan | 0.26 | 0.24 | 0.57 |
| DBPedia | nan | 0.41 | 0.42 | 0.53 |
| FEVER | nan | 0.83 | 0.84 | 0.96 |
| FiQA2018 | 0.62 | 0.44 | 0.36 | 0.80 |
| HotpotQA | nan | 0.71 | 0.69 | 0.88 |
| MSMARCO | nan | 0.44 | 0.42 | 0.48 |
| NFCorpus | nan | 0.34 | 0.34 | 0.56 |
| NQ | nan | 0.64 | 0.54 | 0.82 |
| QuoraRetrieval | nan | 0.89 | 0.86 | 0.92 |
| SCIDOCS | 0.25 | 0.17 | 0.16 | 0.35 |
| SciFact | nan | 0.7 | 0.71 | 0.87 |
| TRECCOVID | 0.86 | 0.71 | 0.72 | 0.95 |
| Touche2020 | nan | 0.23 | 0.29 | 0.39 |
| Average | 0.64 | 0.46 | 0.44 | 0.70 |
Results for opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
| task_name | google/gemini-embedding-001 | intfloat/multilingual-e5-large | opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte | Max result |
|---|---|---|---|---|
| ArguAna | 0.86 | 0.54 | 0.52 | 0.90 |
| CQADupstackAndroidRetrieval | nan | 0.49 | 0.46 | 0.74 |
| CQADupstackEnglishRetrieval | nan | 0.46 | 0.44 | 0.70 |
| CQADupstackGamingRetrieval | 0.71 | 0.59 | 0.55 | 0.79 |
| CQADupstackGisRetrieval | nan | 0.37 | 0.36 | 0.63 |
| CQADupstackMathematicaRetrieval | nan | 0.28 | 0.26 | 0.69 |
| CQADupstackPhysicsRetrieval | nan | 0.44 | 0.4 | 0.74 |
| CQADupstackProgrammersRetrieval | nan | 0.42 | 0.38 | 0.66 |
| CQADupstackRetrieval | nan | 0.4 | 0.38 | 0.68 |
| CQADupstackStatsRetrieval | nan | 0.32 | 0.33 | 0.62 |
| CQADupstackTexRetrieval | nan | 0.28 | 0.28 | 0.63 |
| CQADupstackUnixRetrieval | 0.54 | 0.4 | 0.36 | 0.72 |
| CQADupstackWebmastersRetrieval | nan | 0.4 | 0.39 | 0.68 |
| CQADupstackWordpressRetrieval | nan | 0.32 | 0.32 | 0.59 |
| ClimateFEVER | nan | 0.26 | 0.31 | 0.57 |
| DBPedia | nan | 0.41 | 0.45 | 0.53 |
| FEVER | nan | 0.83 | 0.86 | 0.96 |
| FiQA2018 | 0.62 | 0.44 | 0.41 | 0.80 |
| HotpotQA | nan | 0.71 | 0.72 | 0.88 |
| MSMARCO | nan | 0.44 | 0.43 | 0.48 |
| NFCorpus | nan | 0.34 | 0.36 | 0.56 |
| NQ | nan | 0.64 | 0.58 | 0.82 |
| QuoraRetrieval | nan | 0.89 | 0.87 | 0.92 |
| SCIDOCS | 0.25 | 0.17 | 0.17 | 0.35 |
| SciFact | nan | 0.7 | 0.73 | 0.87 |
| TRECCOVID | 0.86 | 0.71 | 0.73 | 0.95 |
| Touche2020 | nan | 0.23 | 0.39 | 0.39 |
| Average | 0.64 | 0.46 | 0.46 | 0.70 |
There was a problem hiding this comment.
You need to specifiy revision of model as in embeddings-benchmark/mteb#2919 instead of main
There was a problem hiding this comment.
Got it. Using the latest commit id should be fine. I see you have created a PR in mteb to fix it.
So I should rename the dir in this PR to make the revision consistent with the one in mteb?
Add opensearch sparse encoding model results. Results on BEIR dataset.
mteb PR: embeddings-benchmark/mteb#2903
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found here