Update LGAI-Embedding results #219

annamodels · 2025-06-11T08:33:50Z

Submit results

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR Add LGAI-Embedding
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

Submit results

KennethEnevoldsen

Seems like these weren't run with the submitted implementation? (at least the metadata does not match)

Results for annamodels/LGAI-Embedding-Preview

task_name	annamodels/LGAI-Embedding-Preview	google/gemini-embedding-001	intfloat/multilingual-e5-large
AmazonCounterfactualClassification	0.93	0.88	0.7
ArXivHierarchicalClusteringP2P	0.66	0.65	0.56
ArXivHierarchicalClusteringS2S	0.64	0.64	0.54
ArguAna	0.87	0.86	0.54
AskUbuntuDupQuestions	0.66	0.64	0.59
BIOSSES	0.86	0.89	0.85
Banking77Classification	0.91	0.94	0.75
BiorxivClusteringP2P.v2	0.54	0.54	0.37
CQADupstackGamingRetrieval	0.70	0.71	0.59
CQADupstackUnixRetrieval	0.56	0.54	0.4
ClimateFEVERHardNegatives	0.42	0.31	0.26
FEVERHardNegatives	0.93	0.89	0.84
FiQA2018	0.61	0.62	0.44
HotpotQAHardNegatives	0.76	0.87	0.71
ImdbClassification	0.97	0.95	0.89
MTOPDomainClassification	0.98	0.98	0.9
MassiveIntentClassification	0.82	0.82	0.6
MassiveScenarioClassification	0.85	0.87	0.7
MedrxivClusteringP2P.v2	0.47	0.47	0.34
MedrxivClusteringS2S.v2	0.48	0.45	0.32
MindSmallReranking	0.33	0.33	0.3
SCIDOCS	0.27	0.25	0.17
SICK-R	0.85	0.83	0.8
STS12	0.82	0.82	0.8
STS13	0.90	0.90	0.82
STS14	0.88	0.85	0.78
STS15	0.92	0.90	0.89
STS17	0.90	0.89	0.82
STS22.v2	0.75	0.72	0.64
STSBenchmark	0.91	0.89	0.87
SprintDuplicateQuestions	0.97	0.97	0.93
StackExchangeClustering.v2	0.79	0.92	0.46
StackExchangeClusteringP2P.v2	0.49	0.51	0.39
SummEvalSummarization.v2	0.39	0.38	0.31
TRECCOVID	0.90	0.86	0.71
Touche2020Retrieval.v3	0.59	0.52	0.5
ToxicConversationsClassification	0.93	0.89	0.66
TweetSentimentExtractionClassification	0.80	0.70	0.63
TwentyNewsgroupsClustering.v2	0.68	0.57	0.39
TwitterSemEval2015	0.80	0.79	0.75
TwitterURLCorpus	0.88	0.87	0.86
Average	0.74	0.73	0.62

task_name	ByteDance-Seed/Seed1.5-Embedding	annamodels/LGAI-Embedding-Preview
AmazonCounterfactualClassification	0.91	0.93
ArXivHierarchicalClusteringP2P	0.65	0.66
ArXivHierarchicalClusteringS2S	0.64	0.64
ArguAna	0.75	0.87
AskUbuntuDupQuestions	0.69	0.66
BIOSSES	0.83	0.86
Banking77Classification	0.92	0.91
BiorxivClusteringP2P.v2	0.56	0.54
CQADupstackGamingRetrieval	0.72	0.70
CQADupstackUnixRetrieval	0.57	0.56
ClimateFEVERHardNegatives	0.48	0.42
FEVERHardNegatives	0.95	0.93
FiQA2018	0.65	0.61
HotpotQAHardNegatives	0.86	0.76
ImdbClassification	0.97	0.97
MTOPDomainClassification	0.99	0.98
MassiveIntentClassification	0.87	0.82
MassiveScenarioClassification	0.93	0.85
MedrxivClusteringP2P.v2	0.52	0.47
MedrxivClusteringS2S.v2	0.51	0.48
MindSmallReranking	0.33	0.33
SCIDOCS	0.26	0.27
SICK-R	0.84	0.85
STS12	0.85	0.82
STS13	0.93	0.90
STS14	0.90	0.88
STS15	0.92	0.92
STS17	0.93	0.90
STS22.v2	0.73	0.75
STSBenchmark	0.92	0.91
SprintDuplicateQuestions	0.97	0.97
StackExchangeClustering.v2	0.81	0.79
StackExchangeClusteringP2P.v2	0.53	0.49
SummEvalSummarization.v2	0.36	0.39
TRECCOVID	0.88	0.90
Touche2020Retrieval.v3	0.64	0.59
ToxicConversationsClassification	0.87	0.93
TweetSentimentExtractionClassification	0.72	0.80
TwentyNewsgroupsClustering.v2	0.65	0.68
TwitterSemEval2015	0.78	0.80
TwitterURLCorpus	0.87	0.88
Average	0.75	0.74

All of the suspiciously high scores are due to them being non-zero-shot.

annamodels · 2025-06-12T00:48:04Z

@KennethEnevoldsen Thanks for your reviewing.
When implementing our model, we applied the following methodologies, which are described in detail in our technical report (https://arxiv.org/pdf/2506.07438). (Note that the original model name was LG-ANNA-Embedding, but it was recently changed to LGAI-Embedding-Preview, so the report title will also be updated accordingly.)

Knowledge distillation through soft labeling (Section 4.1)
Instruction tuning using in-task examples (query + positive) during training (Section 4.2)
Few-shot examples added during inference as well (as noted on the HuggingFace model page)
Sophisticated hard-negative mining techniques (Section 4.3)
Converting NLI datasets into STS-style format (Section 3 - Data Conversion)

If there's anything we need to adjust to have our model listed on the leaderboard, could you kindly provide a clear guide? We’d be happy to revise accordingly based on your instructions.

KennethEnevoldsen · 2025-06-12T20:25:52Z

Sorry, this wasn't clear. It seems like these weren't obtained using the submitted implementation. At least the metadata file does not match. Simply rerunning it with the submitted implementation should solve this.

annamodels · 2025-06-12T20:40:17Z

@KennethEnevoldsen Thanks for your response.
First, I’d like to clarify that the scores we submitted were obtained using inference with our model.
Could you please explain in more detail what you mean by “the submitted implementation”? It would be helpful to better understand what exactly is expected.
Also, could you elaborate on the part where you mentioned at least the metadata file does not match? A bit more context would be appreciated.
If there are any specific references or parts we should follow, we’d really appreciate it if you could guide us.

KennethEnevoldsen · 2025-06-13T07:58:55Z

First, I’d like to clarify that the scores we submitted were obtained using inference with our model.

Thanks for the confirmation.

The problem is that the submitted file model_meta.json does not align with the ModelMeta (e.g. reference is None).

It might be that you hadn't added all the metadata beforehand. If that is the case, you can just recreate the model_meta.json by deleting it and running a task.

annamodels · 2025-06-13T09:47:25Z

@KennethEnevoldsen Thanks so much for your kind response — I really appreciate it.

During inference, we simply evaluated the model locally without specifying any metadata, which is likely why model_meta.json ended up with missing fields like reference.

Even when we load our model from Hugging Face for evaluation, the reference field in model_meta.json still appears as null.

Would it be possible for you to kindly guide us on the correct way to input or configure the model metadata? We’d be very grateful for any instructions or examples you could share.

KennethEnevoldsen · 2025-06-13T12:01:02Z

You should be able to do it using:

meta= mteb.get_model_meta(name)
model = meta.load_model() # load the model using the specified implementation

# evaluate with mteb
eval = mteb.MTEB(tasks = tasks)
results = eval.run(model, ...)

annamodels · 2025-06-13T14:01:15Z

Following your instructions, I’ve found that the model_meta.json file now correctly aligns with the ModelMeta information.

Just to clarify, the reason the initial model_meta.json we submitted had none values for some fields was because we hadn’t yet submitted the model to GitHub. At that time, we were running inference using our own evaluation script on a local directory where the model was stored — which is why some metadata fields were left as none.

Now that the metadata is correctly aligned and updated, is there anything else we should double-check to ensure our model can be listed on the leaderboard?

updated

KennethEnevoldsen · 2025-06-15T16:49:16Z

Perfect, thanks for taking the time on this! I have enabled auto-merge on this PR (so it should be on the leaderboard by tomorrows update)

Update LGAI-Embedding results

c1652ed

Submit results

KennethEnevoldsen reviewed Jun 11, 2025

View reviewed changes

Update model_meta.json

aacdf7c

updated

KennethEnevoldsen enabled auto-merge (squash) June 15, 2025 16:49

Samoed approved these changes Jun 15, 2025

View reviewed changes

KennethEnevoldsen merged commit 137ba81 into embeddings-benchmark:main Jun 15, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LGAI-Embedding results #219

Update LGAI-Embedding results #219

Uh oh!

annamodels commented Jun 11, 2025

Uh oh!

KennethEnevoldsen left a comment •

edited

Loading

Uh oh!

annamodels commented Jun 12, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Jun 12, 2025

Uh oh!

annamodels commented Jun 12, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Jun 13, 2025

Uh oh!

annamodels commented Jun 13, 2025

Uh oh!

KennethEnevoldsen commented Jun 13, 2025

Uh oh!

annamodels commented Jun 13, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Jun 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update LGAI-Embedding results #219

Update LGAI-Embedding results #219

Uh oh!

Conversation

annamodels commented Jun 11, 2025

Checklist

Uh oh!

KennethEnevoldsen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annamodels commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 12, 2025

Uh oh!

annamodels commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 13, 2025

Uh oh!

annamodels commented Jun 13, 2025

Uh oh!

KennethEnevoldsen commented Jun 13, 2025

Uh oh!

annamodels commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KennethEnevoldsen left a comment •

edited

Loading

annamodels commented Jun 12, 2025 •

edited

Loading

annamodels commented Jun 12, 2025 •

edited

Loading

annamodels commented Jun 13, 2025 •

edited

Loading