Add potion-multilingual-128M results #202

Pringled · 2025-05-23T08:10:08Z

Hi!

We're releasing a new Model2Vec model soon: potion-multilingual-128M. We'd love to add it to MMTEB, this PR adds the results. There is a PR up to the main MTEB repo as well for the ModelMeta for this new model.

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR: feature: Added potion-multilingual-128M mteb#2717
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not training on the dataset including the training set. If I have I have disclosed it clearly.

KennethEnevoldsen · 2025-05-23T09:27:03Z

Congratulations on the new model! Could you please run make pre-push in the repository?

KennethEnevoldsen · 2025-05-23T10:40:47Z

Did a table to get an overview and to check the numbers. Everything seems reasonable to me.

task_name	gemini-embedding-001	multilingual-e5-small	potion-multilingual-128M	static-similarity-mrl-multilingual-v1
AILAStatutes	0.49	0.19	0.17	0.17
AfriSentiClassification	0.54	0.42	0.40	0.40
AlloProfClusteringS2S.v2	0.56	0.31	0.27	0.21
AlloprofReranking	0.82	0.64	0.48	0.53
AmazonCounterfactualClassification	0.88	0.71	0.65	0.68
ArXivHierarchicalClusteringP2P	0.65	0.53	0.49	0.48
ArXivHierarchicalClusteringS2S	0.64	0.54	0.51	0.40
ArguAna	0.86	0.39	0.40	0.44
ArmenianParaphrasePC	0.97	0.94	0.93	0.92
BUCC.v2	0.99	0.97	0.76	0.97
BelebeleRetrieval	0.91	0.66	0.43	0.38
BibleNLPBitextMining	0.21	0.12	0.04	0.05
BigPatentClustering.v2	0.38	0.29	0.32	0.30
BiorxivClusteringP2P.v2	0.54	0.37	0.32	0.29
BornholmBitextMining	0.52	0.44	0.30	0.48
BrazilianToxicTweetsClassification	0.28	0.19	0.22	0.24
BulgarianStoreReviewSentimentClassfication	0.78	0.58	0.43	0.40
CEDRClassification	0.57	0.4	0.34	0.37
CLSClusteringP2P.v2	0.43	0.39	0.34	0.30
CSFDSKMovieReviewSentimentClassification	0.49	0.27	0.24	0.24
CTKFactsNLI	0.88	0.77	0.73	0.75
CataloniaTweetClassification	0.55	0.48	0.45	0.45
Core17InstructionRetrieval	0.08	-0	-0.01	-0.00
CovidRetrieval	0.79	0.73	0.49	0.46
CyrillicTurkicLangClassification	0.95	0.43	0.66	0.59
CzechProductReviewSentimentClassification	0.68	0.51	0.47	0.45
DBpediaClassification	0.95	0.87	0.84	0.84
DalajClassification	0.50	0.5	0.50	0.50
DiaBlaBitextMining	0.87	0.82	0.36	0.75
EstonianValenceClassification	0.54	0.4	0.33	0.27
FaroeseSTS	0.86	0.69	0.65	0.72
FilipinoShopeeReviewsClassification	0.48	0.33	0.28	0.29
FinParaSTS	0.29	0.2	0.10	0.17
FinancialPhrasebankClassification	0.89	0.78	0.62	0.52
FloresBitextMining	0.84	0.7	0.23	0.22
GermanSTSBenchmark	0.88	0.81	0.71	0.73
GreekLegalCodeClassification	0.44	0.33	0.17	0.28
GujaratiNewsClassification	0.92	0.74	0.77	0.50
HALClusteringS2S.v2	0.32	0.18	0.15	0.08
HagridRetrieval	0.99	0.99	0.94	0.97
IN22GenBitextMining	0.94	0.74	0.31	0.21
IndicCrosslingualSTS	0.63	0.41	0.15	0.08
IndicGenBenchFloresBitextMining	0.97	0.87	0.46	0.57
IndicLangClassification	0.88	0.17	0.43	0.59
IndonesianIdClickbaitClassification	0.67	0.58	0.55	0.52
IsiZuluNewsClassification	0.41	0.3	0.25	0.26
ItaCaseholdClassification	0.73	0.66	0.52	0.55
JSICK	0.85	0.82	0.76	0.79
KorHateSpeechMLClassification	0.18	0.09	0.07	0.04
KorSarcasmClassification	0.61	0.55	0.55	0.53
KurdishSentimentClassification	0.86	0.79	0.60	0.60
LEMBPasskeyRetrieval	0.39	0.38	0.35	1.00
LegalBenchCorporateLobbying	0.96	0.89	0.83	0.85
MIRACLRetrievalHardNegatives	0.70	0.6	0.19	0.19
MLQARetrieval	0.84	0.65	0.41	0.41
MacedonianTweetSentimentClassification	0.72	0.54	0.54	0.45
MalteseNewsClassification	0.37	0.23	0.14	0.05
MasakhaNEWSClassification	0.84	0.73	0.68	0.59
MasakhaNEWSClusteringS2S	0.57	0.38	0.37	0.31
MassiveIntentClassification	0.82	0.52	0.50	0.46
MedrxivClusteringP2P.v2	0.47	0.34	0.32	0.28
MultiEURLEXMultilabelClassification	0.05	0.04	0.03	0.03
MultiHateClassification	0.72	0.59	0.55	0.56
NTREXBitextMining	0.94	0.84	0.42	0.43
NepaliNewsClassification	0.98	0.9	0.92	0.75
News21InstructionRetrieval	0.10	0.01	-0.01	-0.00
NollySentiBitextMining	0.69	0.55	0.28	0.30
NordicLangClassification	0.86	0.72	0.50	0.51
NorwegianCourtsBitextMining	0.93	0.93	0.86	0.94
NusaParagraphEmotionClassification	0.56	0.38	0.35	0.35
NusaTranslationBitextMining	0.78	0.77	0.59	0.69
NusaX-senti	0.80	0.66	0.60	0.55
NusaXBitextMining	0.83	0.6	0.36	0.49
OdiaNewsClassification	0.92	0.78	0.61	0.34
OpusparcusPC	0.97	0.93	0.88	0.90
PAC	0.72	0.7	0.62	0.56
PawsXPairClassification	0.60	0.52	0.47	0.48
PlscClusteringP2P.v2	0.74	0.69	0.70	0.63
PoemSentimentClassification	0.60	0.46	0.41	0.32
PolEmo2.0-OUT	0.78	0.24	0.31	0.38
PpcPC	0.96	0.87	0.69	0.75
PunjabiNewsClassification	0.83	0.79	0.84	0.67
RTE3	0.90	0.87	0.87	0.89
Robust04InstructionRetrieval	-0.02	-0.08	-0.04	-0.03
RomaniBibleClustering	0.43	0.4	0.38	0.33
RuBQReranking	0.74	0.71	0.47	0.53
SCIDOCS	0.25	0.14	0.07	0.11
SIB200ClusteringS2S	0.42	0.24	0.14	0.06
SICK-R	0.83	0.79	0.61	0.68
SNLHierarchicalClusteringP2P	0.61	0.56	0.58	0.53
STS12	0.82	0.78	0.64	0.72
STS13	0.90	0.77	0.74	0.74
STS14	0.85	0.78	0.70	0.73
STS15	0.90	0.87	0.79	0.83
STS17	0.89	0.87	0.62	0.73
STS22.v2	0.72	0.66	0.60	0.57
STSB	0.85	0.8	0.71	0.72
STSBenchmark	0.89	0.84	0.72	0.80
STSES	0.82	0.79	0.71	0.72
ScalaClassification	0.52	0.51	0.50	0.50
SemRel24STS	0.73	0.6	0.59	0.52
SentimentAnalysisHindi	0.76	0.63	0.44	0.45
SinhalaNewsClassification	0.82	0.68	0.69	0.32
SiswatiNewsClassification	0.62	0.49	0.54	0.71
SlovakMovieReviewSentimentClassification	0.90	0.61	0.58	0.56
SpartQA	0.10	0.05	0.20	0.16
SprintDuplicateQuestions	0.97	0.94	0.91	0.92
StackExchangeClustering.v2	0.92	0.5	0.47	0.31
StackOverflowQA	0.97	0.82	0.46	0.50
StatcanDialogueDatasetRetrieval	0.51	0.1	0.03	0.03
SwahiliNewsClassification	0.66	0.61	0.60	0.51
SwednClusteringP2P	0.46	0.36	0.32	0.12
SwissJudgementClassification	0.58	0.54	0.54	0.54
T2Reranking	0.68	0.66	0.65	0.65
TERRa	0.64	0.58	0.50	0.59
TRECCOVID	0.86	0.72	0.42	0.35
Tatoeba	0.82	0.69	0.33	0.50
TempReasonL1	0.03	0.01	0.00	0.01
ToxicConversationsClassification	0.89	0.64	0.65	0.60
TswanaNewsClassification	0.53	0.4	0.32	0.32
TweetTopicSingleClassification	0.71	0.66	0.54	0.44
TwitterHjerneRetrieval	0.98	0.29	0.37	0.25
TwitterURLCorpus	0.87	0.86	0.75	0.83
VoyageMMarcoReranking	0.67	0.63	0.37	0.38
WebLINXCandidatesReranking	0.11	0.1	0.08	0.06
WikiCitiesClustering	0.92	0.75	0.66	0.63
WikiClusteringP2P.v2	0.28	0.25	0.23	0.20
WikipediaRerankingMultilingual	0.92	0.87	0.79	0.81
WikipediaRetrievalMultilingual	0.94	0.88	0.63	0.66
WinoGrande	0.61	0.37	0.42	0.48
XNLI	0.85	0.7	0.62	0.65
indonli	0.61	0.51	0.49	0.51
Average	0.68	0.56	0.47	0.47

Pringled · 2025-05-23T10:43:18Z

Thanks @KennethEnevoldsen! 😄

KennethEnevoldsen · 2025-05-23T10:45:45Z

No worries, we are starting to do this more to prevent suspicious submissions. I also just like to see the numbers :)

Pringled added 3 commits May 23, 2025 10:04

Added potion-multilingual-128M results

aff2b5c

Updated revision

bfc69c7

Removed external

549d643

ran prepush

854e9de

KennethEnevoldsen merged commit d3f6a29 into embeddings-benchmark:main May 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add potion-multilingual-128M results #202

Add potion-multilingual-128M results #202

Uh oh!

Pringled commented May 23, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented May 23, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented May 23, 2025

Uh oh!

Pringled commented May 23, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add potion-multilingual-128M results #202

Add potion-multilingual-128M results #202

Uh oh!

Conversation

Pringled commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

KennethEnevoldsen commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented May 23, 2025

Uh oh!

Pringled commented May 23, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pringled commented May 23, 2025 •

edited

Loading

KennethEnevoldsen commented May 23, 2025 •

edited

Loading