Skip to content

Conversation

@Pringled
Copy link
Contributor

@Pringled Pringled commented May 23, 2025

Hi!

We're releasing a new Model2Vec model soon: potion-multilingual-128M. We'd love to add it to MMTEB, this PR adds the results. There is a PR up to the main MTEB repo as well for the ModelMeta for this new model.

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not training on the dataset including the training set. If I have I have disclosed it clearly.

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented May 23, 2025

Congratulations on the new model! Could you please run make pre-push in the repository?

@KennethEnevoldsen
Copy link
Contributor

Did a table to get an overview and to check the numbers. Everything seems reasonable to me.

task_name gemini-embedding-001 multilingual-e5-small potion-multilingual-128M static-similarity-mrl-multilingual-v1
AILAStatutes 0.49 0.19 0.17 0.17
AfriSentiClassification 0.54 0.42 0.40 0.40
AlloProfClusteringS2S.v2 0.56 0.31 0.27 0.21
AlloprofReranking 0.82 0.64 0.48 0.53
AmazonCounterfactualClassification 0.88 0.71 0.65 0.68
ArXivHierarchicalClusteringP2P 0.65 0.53 0.49 0.48
ArXivHierarchicalClusteringS2S 0.64 0.54 0.51 0.40
ArguAna 0.86 0.39 0.40 0.44
ArmenianParaphrasePC 0.97 0.94 0.93 0.92
BUCC.v2 0.99 0.97 0.76 0.97
BelebeleRetrieval 0.91 0.66 0.43 0.38
BibleNLPBitextMining 0.21 0.12 0.04 0.05
BigPatentClustering.v2 0.38 0.29 0.32 0.30
BiorxivClusteringP2P.v2 0.54 0.37 0.32 0.29
BornholmBitextMining 0.52 0.44 0.30 0.48
BrazilianToxicTweetsClassification 0.28 0.19 0.22 0.24
BulgarianStoreReviewSentimentClassfication 0.78 0.58 0.43 0.40
CEDRClassification 0.57 0.4 0.34 0.37
CLSClusteringP2P.v2 0.43 0.39 0.34 0.30
CSFDSKMovieReviewSentimentClassification 0.49 0.27 0.24 0.24
CTKFactsNLI 0.88 0.77 0.73 0.75
CataloniaTweetClassification 0.55 0.48 0.45 0.45
Core17InstructionRetrieval 0.08 -0 -0.01 -0.00
CovidRetrieval 0.79 0.73 0.49 0.46
CyrillicTurkicLangClassification 0.95 0.43 0.66 0.59
CzechProductReviewSentimentClassification 0.68 0.51 0.47 0.45
DBpediaClassification 0.95 0.87 0.84 0.84
DalajClassification 0.50 0.5 0.50 0.50
DiaBlaBitextMining 0.87 0.82 0.36 0.75
EstonianValenceClassification 0.54 0.4 0.33 0.27
FaroeseSTS 0.86 0.69 0.65 0.72
FilipinoShopeeReviewsClassification 0.48 0.33 0.28 0.29
FinParaSTS 0.29 0.2 0.10 0.17
FinancialPhrasebankClassification 0.89 0.78 0.62 0.52
FloresBitextMining 0.84 0.7 0.23 0.22
GermanSTSBenchmark 0.88 0.81 0.71 0.73
GreekLegalCodeClassification 0.44 0.33 0.17 0.28
GujaratiNewsClassification 0.92 0.74 0.77 0.50
HALClusteringS2S.v2 0.32 0.18 0.15 0.08
HagridRetrieval 0.99 0.99 0.94 0.97
IN22GenBitextMining 0.94 0.74 0.31 0.21
IndicCrosslingualSTS 0.63 0.41 0.15 0.08
IndicGenBenchFloresBitextMining 0.97 0.87 0.46 0.57
IndicLangClassification 0.88 0.17 0.43 0.59
IndonesianIdClickbaitClassification 0.67 0.58 0.55 0.52
IsiZuluNewsClassification 0.41 0.3 0.25 0.26
ItaCaseholdClassification 0.73 0.66 0.52 0.55
JSICK 0.85 0.82 0.76 0.79
KorHateSpeechMLClassification 0.18 0.09 0.07 0.04
KorSarcasmClassification 0.61 0.55 0.55 0.53
KurdishSentimentClassification 0.86 0.79 0.60 0.60
LEMBPasskeyRetrieval 0.39 0.38 0.35 1.00
LegalBenchCorporateLobbying 0.96 0.89 0.83 0.85
MIRACLRetrievalHardNegatives 0.70 0.6 0.19 0.19
MLQARetrieval 0.84 0.65 0.41 0.41
MacedonianTweetSentimentClassification 0.72 0.54 0.54 0.45
MalteseNewsClassification 0.37 0.23 0.14 0.05
MasakhaNEWSClassification 0.84 0.73 0.68 0.59
MasakhaNEWSClusteringS2S 0.57 0.38 0.37 0.31
MassiveIntentClassification 0.82 0.52 0.50 0.46
MedrxivClusteringP2P.v2 0.47 0.34 0.32 0.28
MultiEURLEXMultilabelClassification 0.05 0.04 0.03 0.03
MultiHateClassification 0.72 0.59 0.55 0.56
NTREXBitextMining 0.94 0.84 0.42 0.43
NepaliNewsClassification 0.98 0.9 0.92 0.75
News21InstructionRetrieval 0.10 0.01 -0.01 -0.00
NollySentiBitextMining 0.69 0.55 0.28 0.30
NordicLangClassification 0.86 0.72 0.50 0.51
NorwegianCourtsBitextMining 0.93 0.93 0.86 0.94
NusaParagraphEmotionClassification 0.56 0.38 0.35 0.35
NusaTranslationBitextMining 0.78 0.77 0.59 0.69
NusaX-senti 0.80 0.66 0.60 0.55
NusaXBitextMining 0.83 0.6 0.36 0.49
OdiaNewsClassification 0.92 0.78 0.61 0.34
OpusparcusPC 0.97 0.93 0.88 0.90
PAC 0.72 0.7 0.62 0.56
PawsXPairClassification 0.60 0.52 0.47 0.48
PlscClusteringP2P.v2 0.74 0.69 0.70 0.63
PoemSentimentClassification 0.60 0.46 0.41 0.32
PolEmo2.0-OUT 0.78 0.24 0.31 0.38
PpcPC 0.96 0.87 0.69 0.75
PunjabiNewsClassification 0.83 0.79 0.84 0.67
RTE3 0.90 0.87 0.87 0.89
Robust04InstructionRetrieval -0.02 -0.08 -0.04 -0.03
RomaniBibleClustering 0.43 0.4 0.38 0.33
RuBQReranking 0.74 0.71 0.47 0.53
SCIDOCS 0.25 0.14 0.07 0.11
SIB200ClusteringS2S 0.42 0.24 0.14 0.06
SICK-R 0.83 0.79 0.61 0.68
SNLHierarchicalClusteringP2P 0.61 0.56 0.58 0.53
STS12 0.82 0.78 0.64 0.72
STS13 0.90 0.77 0.74 0.74
STS14 0.85 0.78 0.70 0.73
STS15 0.90 0.87 0.79 0.83
STS17 0.89 0.87 0.62 0.73
STS22.v2 0.72 0.66 0.60 0.57
STSB 0.85 0.8 0.71 0.72
STSBenchmark 0.89 0.84 0.72 0.80
STSES 0.82 0.79 0.71 0.72
ScalaClassification 0.52 0.51 0.50 0.50
SemRel24STS 0.73 0.6 0.59 0.52
SentimentAnalysisHindi 0.76 0.63 0.44 0.45
SinhalaNewsClassification 0.82 0.68 0.69 0.32
SiswatiNewsClassification 0.62 0.49 0.54 0.71
SlovakMovieReviewSentimentClassification 0.90 0.61 0.58 0.56
SpartQA 0.10 0.05 0.20 0.16
SprintDuplicateQuestions 0.97 0.94 0.91 0.92
StackExchangeClustering.v2 0.92 0.5 0.47 0.31
StackOverflowQA 0.97 0.82 0.46 0.50
StatcanDialogueDatasetRetrieval 0.51 0.1 0.03 0.03
SwahiliNewsClassification 0.66 0.61 0.60 0.51
SwednClusteringP2P 0.46 0.36 0.32 0.12
SwissJudgementClassification 0.58 0.54 0.54 0.54
T2Reranking 0.68 0.66 0.65 0.65
TERRa 0.64 0.58 0.50 0.59
TRECCOVID 0.86 0.72 0.42 0.35
Tatoeba 0.82 0.69 0.33 0.50
TempReasonL1 0.03 0.01 0.00 0.01
ToxicConversationsClassification 0.89 0.64 0.65 0.60
TswanaNewsClassification 0.53 0.4 0.32 0.32
TweetTopicSingleClassification 0.71 0.66 0.54 0.44
TwitterHjerneRetrieval 0.98 0.29 0.37 0.25
TwitterURLCorpus 0.87 0.86 0.75 0.83
VoyageMMarcoReranking 0.67 0.63 0.37 0.38
WebLINXCandidatesReranking 0.11 0.1 0.08 0.06
WikiCitiesClustering 0.92 0.75 0.66 0.63
WikiClusteringP2P.v2 0.28 0.25 0.23 0.20
WikipediaRerankingMultilingual 0.92 0.87 0.79 0.81
WikipediaRetrievalMultilingual 0.94 0.88 0.63 0.66
WinoGrande 0.61 0.37 0.42 0.48
XNLI 0.85 0.7 0.62 0.65
indonli 0.61 0.51 0.49 0.51
Average 0.68 0.56 0.47 0.47

@Pringled
Copy link
Contributor Author

Thanks @KennethEnevoldsen! 😄

@KennethEnevoldsen KennethEnevoldsen merged commit d3f6a29 into embeddings-benchmark:main May 23, 2025
2 checks passed
@KennethEnevoldsen
Copy link
Contributor

No worries, we are starting to do this more to prevent suspicious submissions. I also just like to see the numbers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants