Skip to content

Comments

fix table calculation#402

Merged
Samoed merged 1 commit intomainfrom
fix_table_calculation
Jan 15, 2026
Merged

fix table calculation#402
Samoed merged 1 commit intomainfrom
fix_table_calculation

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Jan 15, 2026

Applying fix from embeddings-benchmark/mteb#3674

@Samoed
Copy link
Member Author

Samoed commented Jan 15, 2026

Example table from #317 with validate_and_filter. BelebeleRetrieval scores are the same now


Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Alibaba-NLP/gte-multilingual-base, BAAI/bge-m3, Snowflake/snowflake-arctic-embed-l-v2.0, Snowflake/snowflake-arctic-embed-m-v2.0, ibm-granite/granite-embedding-107m-multilingual, ibm-granite/granite-embedding-278m-multilingual, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, minishlab/potion-multilingual-128M, sentence-transformers/LaBSE, sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1
Tasks: ArguAna-NL, ArguAna-NL.v2, BelebeleRetrieval, CovidDisinformationNLMultiLabelClassification, DutchBookReviewSentimentClassification, DutchBookReviewSentimentClassification.v2, DutchColaClassification, DutchGovernmentBiasClassification, DutchNewsArticlesClassification, DutchNewsArticlesClusteringP2P, DutchNewsArticlesClusteringS2S, DutchNewsArticlesRetrieval, DutchSarcasticHeadlinesClassification, IconclassClassification, IconclassClusteringS2S, LegalQANLRetrieval, MassiveIntentClassification, MassiveScenarioClassification, MultiEURLEXMultilabelClassification, MultiHateClassification, NFCorpus-NL, NFCorpus-NL.v2, OpenTenderClassification, OpenTenderClusteringP2P, OpenTenderClusteringS2S, OpenTenderRetrieval, SCIDOCS-NL, SCIDOCS-NL.v2, SIB200Classification, SIB200ClusteringS2S, SICK-NL-STS, SICKNLPairClassification, STSBenchmarkMultilingualSTS, SciFact-NL, SciFact-NL.v2, VABBClusteringP2P, VABBClusteringS2S, VABBMultiLabelClassification, VABBRetrieval, VaccinChatNLClassification, WebFAQRetrieval, WikipediaRerankingMultilingual, WikipediaRetrievalMultilingual, XLWICNLPairClassification, bBSARDNLRetrieval

Results for Alibaba-NLP/gte-multilingual-base

task_name Alibaba-NLP/gte-multilingual-base Alibaba-NLP/gte-multilingual-base Alibaba-NLP/gte-multilingual-base google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
Revisions 7fc06782350c1a83f88b15dd4b38ef853d3b8503 ca1791e0bcc104f6db161f27de1340241b13c5a4 external
ArguAna-NL.v2 nan 0.5283 nan nan 0.4894 False
BelebeleRetrieval 0.8920 0.8920 nan 0.9649 0.9525 0.9858 Qwen/Qwen3-Embedding-8B False
CovidDisinformationNLMultiLabelClassification nan 0.4972 nan nan 0.4970 False
DutchBookReviewSentimentClassification nan 0.7695 nan nan 0.7770 0.8821 intfloat/multilingual-e5-large-instruct False
DutchBookReviewSentimentClassification.v2 nan 0.7459 nan nan 0.6256 False
DutchColaClassification nan 0.5550 nan nan 0.5676 False
DutchGovernmentBiasClassification nan 0.6193 nan nan 0.6193 False
DutchNewsArticlesClassification nan 0.5300 nan nan 0.5781 False
DutchNewsArticlesClusteringP2P nan 0.3507 nan nan 0.4045 False
DutchNewsArticlesClusteringS2S nan 0.2790 nan nan 0.2601 False
DutchNewsArticlesRetrieval nan 0.8088 nan nan 0.7459 False
DutchSarcasticHeadlinesClassification nan 0.6578 nan nan 0.7281 False
IconclassClassification nan 0.5390 nan nan 0.5134 False
IconclassClusteringS2S nan 0.2595 nan nan 0.2220 False
LegalQANLRetrieval nan 0.6490 nan nan 0.7748 False
MassiveIntentClassification 0.6193 0.6193 0.6636 0.8567 0.7045 0.8567 google/gemini-embedding-001 False
MassiveScenarioClassification nan 0.6985 0.7196 0.8976 0.7483 0.8976 google/gemini-embedding-001 False
MultiEURLEXMultilabelClassification 0.0094 0.0072 nan 0.0594 0.0567 0.0733 google/text-multilingual-embedding-002 False
MultiHateClassification 0.5824 0.5824 nan 0.7417 0.6261 0.8710 tencent/KaLM-Embedding-Gemma3-12B-2511 False
NFCorpus-NL.v2 nan 0.2947 nan nan 0.2982 False
OpenTenderClassification nan 0.4336 nan nan 0.4193 False
OpenTenderClusteringP2P nan 0.3364 nan nan 0.2301 False
OpenTenderClusteringS2S nan 0.2761 nan nan 0.1617 False
OpenTenderRetrieval nan 0.4219 nan nan 0.3778 False
SCIDOCS-NL.v2 nan 0.1584 nan nan 0.1309 False
SIB200Classification nan 0.6725 nan nan 0.7491 0.8496 intfloat/e5-mistral-7b-instruct False
SIB200ClusteringS2S 0.2489 0.2411 nan 0.4663 0.4166 0.6539 nvidia/llama-embed-nemotron-8b False
SICK-NL-STS nan 0.7582 nan nan 0.7692 False
SICKNLPairClassification nan 0.9294 nan nan 0.9332 False
STSBenchmarkMultilingualSTS 0.8302 0.8302 nan nan 0.8413 0.9552 Gameselo/STS-multilingual-mpnet-base-v2 False
SciFact-NL.v2 nan 0.6433 nan nan 0.6840 False
VABBClusteringP2P nan 0.4095 nan nan 0.3437 False
VABBClusteringS2S nan 0.3520 nan nan 0.3071 False
VABBMultiLabelClassification nan 0.5212 nan nan 0.5233 False
VABBRetrieval nan 0.7367 nan nan 0.7036 False
VaccinChatNLClassification nan 0.4881 nan nan 0.5063 False
WikipediaRerankingMultilingual 0.8234 0.8237 nan 0.9267 0.9031 0.9267 google/gemini-embedding-001 False
WikipediaRetrievalMultilingual 0.8398 0.8400 nan 0.9448 0.9182 0.9448 google/gemini-embedding-001 False
XLWICNLPairClassification nan 0.6274 nan nan 0.6732 False
bBSARDNLRetrieval nan 0.2009 nan nan 0.2384 False
Average 0.6057 0.5396 0.6916 0.7323 0.5505 0.8088 nan -

@Samoed Samoed merged commit 6d95050 into main Jan 15, 2026
2 checks passed
@Samoed Samoed deleted the fix_table_calculation branch February 3, 2026 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant