add cadet results #203

manveertamber · 2025-05-25T17:43:18Z

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not training on the dataset including the training set. If I have I have disclosed it clearly.

results/manveertamber__cadet-embed-base-v1/external/model_meta.json

KennethEnevoldsen · 2025-05-26T11:52:18Z

ref to model implementation PR:
embeddings-benchmark/mteb#2727

KennethEnevoldsen · 2025-05-26T12:10:04Z

Congratulations on the release and great paper will def. give that a closer read!

Here is an overview of the results compared to existing models:

task_name	intfloat/e5-base-v2	intfloat/e5-large-v2	intfloat/multilingual-e5-large-instruct	manveertamber/cadet-embed-base-v1
AILACasedocs	0.27	0.31	0.33	0.30
AILAStatutes	0.2	0.18	0.30	0.24
ARCChallenge	0.1	0.11	0.15	0.12
AlphaNLI	0.22	0.15	0.25	0.32
AppsRetrieval	0.12	0.14	0.35	0.09
ArguAna	0.45	0.46	0.58	0.56
BrightLongRetrieval	nan	nan	nan	0.34
BrightRetrieval	nan	nan	nan	0.15
BuiltBenchRetrieval	0.61	0.65	0.65	0.65
CQADupstackAndroidRetrieval	0.5	0.50	0.55	0.48
CQADupstackEnglishRetrieval	0.42	0.44	0.49	0.49
CQADupstackGamingRetrieval	0.56	0.58	0.64	0.58
CQADupstackGisRetrieval	0.35	0.36	0.41	0.39
CQADupstackMathematicaRetrieval	0.28	0.29	0.31	0.30
CQADupstackPhysicsRetrieval	0.41	0.39	0.49	0.44
CQADupstackProgrammersRetrieval	0.37	0.38	0.47	0.41
CQADupstackRetrieval	0.39	0.38	0.44	0.41
CQADupstackStatsRetrieval	0.33	0.33	0.39	0.35
CQADupstackTexRetrieval	0.27	0.27	0.32	0.30
CQADupstackUnixRetrieval	0.37	0.39	0.45	0.40
CQADupstackWebmastersRetrieval	0.38	0.38	0.45	0.40
CQADupstackWordpressRetrieval	0.31	0.32	0.35	0.34
ChemHotpotQARetrieval	0.83	0.84	nan	0.87
ChemNQRetrieval	0.62	0.69	nan	0.67
ClimateFEVER	0.27	0.22	0.30	0.37
ClimateFEVER.v2	nan	nan	nan	0.30
ClimateFEVERHardNegatives	0.27	0.23	0.24	0.37
CodeFeedbackMT	0.42	0.48	0.40	0.45
CodeFeedbackST	0.75	0.76	0.76	0.75
CosQA	0.33	0.32	0.38	0.32
DBPedia	0.42	0.44	0.38	0.45
DBPediaHardNegatives	nan	nan	0.38	0.48
FEVER	0.85	0.83	0.78	0.89
FEVERHardNegatives	0.85	0.83	0.76	0.89
FaithDial	nan	nan	0.24	0.23
FeedbackQARetrieval	nan	nan	0.55	0.56
FiQA2018	0.4	0.41	0.48	0.41
HagridRetrieval	0.99	0.99	0.99	0.99
HellaSwag	0.25	0.28	0.32	0.30
HotpotQA	0.69	0.73	0.69	0.74
HotpotQAHardNegatives	0.69	0.73	0.65	0.74
LEMBNarrativeQARetrieval	0.25	0.26	0.27	0.25
LEMBNeedleRetrieval	0.29	0.32	0.29	0.28
LEMBPasskeyRetrieval	0.38	0.39	0.38	0.38
LEMBQMSumRetrieval	0.24	0.25	0.26	0.26
LEMBSummScreenFDRetrieval	0.75	0.77	0.73	0.77
LEMBWikimQARetrieval	0.56	0.58	0.58	0.58
LegalBenchConsumerContractsQA	0.72	0.77	0.77	0.77
LegalBenchCorporateLobbying	0.92	0.91	0.94	0.93
LegalSummarization	0.59	0.60	0.68	0.64
LitSearchRetrieval	nan	nan	nan	0.48
MLQuestions	nan	nan	0.60	0.63
MSMARCO	0.42	0.43	0.40	0.43
MSMARCOHardNegatives	nan	nan	0.67	0.73
MedicalQARetrieval	0.69	0.70	0.71	0.71
NFCorpus	0.35	0.37	0.36	0.38
NQ	0.58	0.63	0.58	0.59
QuoraRetrieval	0.87	0.87	0.89	0.88
SCIDOCS	0.19	0.20	0.19	0.19
SciFact	0.72	0.72	0.72	0.75
StackOverflowQA	0.88	0.90	0.86	0.86
SyntheticText2SQL	0.52	0.50	0.59	0.56
TRECCOVID	0.7	0.67	0.83	0.81
Touche2020	0.26	0.21	0.27	0.30
Average	0.48	0.49	0.50	0.50

Overall I don't see too many issues.
FEVER scores looks slightly inflated given the size. Which is likely due to the training on FEVER derived data.

@manveertamber seems like there are still quite a few scores missing from the main English leaderboard (MTEB(eng, v2)) introduced in MMTEB. Do you want to add these as well? (just a recommendation, we can merge as is as well)

task_name	intfloat/e5-base-v2	intfloat/e5-large-v2	intfloat/multilingual-e5-large-instruct	manveertamber/cadet-embed-base-v1
AmazonCounterfactualClassification	0.76	0.78	0.70	nan
ArXivHierarchicalClusteringP2P	0.58	0.58	0.63	nan
ArXivHierarchicalClusteringS2S	0.55	0.55	0.61	nan
ArguAna	0.45	0.46	0.58	0.56
AskUbuntuDupQuestions	0.59	0.60	0.64	nan
BIOSSES	0.81	0.84	0.87	nan
Banking77Classification	0.84	0.85	0.78	nan
BiorxivClusteringP2P.v2	0.39	0.40	0.43	nan
CQADupstackGamingRetrieval	0.56	0.58	0.64	0.58
CQADupstackUnixRetrieval	0.37	0.39	0.45	0.40
ClimateFEVERHardNegatives	0.27	0.23	0.24	0.37
FEVERHardNegatives	0.85	0.83	0.76	0.89
FiQA2018	0.40	0.41	0.48	0.41
HotpotQAHardNegatives	0.69	0.73	0.65	0.74
ImdbClassification	0.86	0.92	0.95	nan
MTOPDomainClassification	0.92	0.93	0.91	nan
MassiveIntentClassification	0.67	0.68	0.71	nan
MassiveScenarioClassification	0.73	0.71	0.74	nan
MedrxivClusteringP2P.v2	0.36	0.35	0.38	nan
MedrxivClusteringS2S.v2	0.33	0.34	0.38	nan
MindSmallReranking	0.31	0.32	0.33	nan
SCIDOCS	0.19	0.20	0.19	0.19
SICK-R	0.78	0.79	0.82	nan
STS12	0.73	0.74	0.83	nan
STS13	0.83	0.81	0.88	nan
STS14	0.80	0.79	0.85	nan
STS15	0.88	0.88	0.91	nan
STS17	0.89	0.90	0.90	nan
STS22.v2	0.67	0.67	0.68	nan
STSBenchmark	0.85	0.85	0.88	nan
SprintDuplicateQuestions	0.94	0.95	0.92	nan
StackExchangeClustering.v2	0.53	0.52	0.60	nan
StackExchangeClusteringP2P.v2	0.40	0.40	0.46	nan
SummEvalSummarization.v2	0.34	0.32	0.30	nan
TRECCOVID	0.70	0.67	0.83	0.81
Touche2020Retrieval.v3	0.50	0.42	0.53	nan
ToxicConversationsClassification	0.66	0.63	0.67	nan
TweetSentimentExtractionClassification	0.60	0.61	0.59	nan
TwentyNewsgroupsClustering.v2	0.48	0.48	0.51	nan
TwitterSemEval2015	0.76	0.77	0.80	nan
TwitterURLCorpus	0.87	0.86	0.87	nan
Average	0.63	0.63	0.66	0.55

manveertamber · 2025-05-28T20:31:37Z

Hi @KennethEnevoldsen, thanks for the kind words!

Can we merge for now? At the moment I'm not too concerned with these additional experiments and the model isn't fine-tuned for clustering/classification in particular anyway.

manveertamber · 2025-06-10T01:39:12Z

Hi @KennethEnevoldsen, were these results files deleted? I can't seem to find them anymore and I'm not sure what happened.

add cadet results

446d0a6

Samoed reviewed May 25, 2025

View reviewed changes

results/manveertamber__cadet-embed-base-v1/external/model_meta.json Outdated Show resolved Hide resolved

add revision and model metadata

9a849ce

KennethEnevoldsen approved these changes May 26, 2025

View reviewed changes

KennethEnevoldsen merged commit c0c7ead into embeddings-benchmark:main May 29, 2025
2 checks passed

manveertamber mentioned this pull request Jun 10, 2025

add cadet results #218

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cadet results #203

add cadet results #203

Uh oh!

manveertamber commented May 25, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented May 26, 2025

Uh oh!

KennethEnevoldsen commented May 26, 2025 •

edited

Loading

Uh oh!

manveertamber commented May 28, 2025

Uh oh!

Uh oh!

manveertamber commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add cadet results #203

add cadet results #203

Uh oh!

Conversation

manveertamber commented May 25, 2025

Checklist

Uh oh!

Uh oh!

KennethEnevoldsen commented May 26, 2025

Uh oh!

KennethEnevoldsen commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manveertamber commented May 28, 2025

Uh oh!

Uh oh!

manveertamber commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KennethEnevoldsen commented May 26, 2025 •

edited

Loading