update results for youtu embedding model by spring-quan · Pull Request #284 · embeddings-benchmark/results

spring-quan · 2025-09-30T15:59:25Z

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

github-actions · 2025-09-30T16:05:23Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Youtu-RAG/Youtu-Embedding-V1
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for `Youtu-RAG/Youtu-Embedding-V1`

task_name	Youtu-RAG/Youtu-Embedding-V1	Youtu-RAG/Youtu-Embedding-V1	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
Revisions	1	32e04afc24817c187a8422e7bdbb493b19796d47
AFQMC	0.6711	0.7219	nan	0.3301	0.7225
ATEC	0.5967	0.6170	nan	0.3980	0.6464
BQ	0.7295	0.7227	nan	0.4644	0.8125
CLSClusteringP2P	0.7580	0.8153	nan	nan	0.8225
CLSClusteringS2S	0.7131	0.7627	nan	nan	0.7408
CMedQAv1-reranking	0.9162	0.9109	nan	0.6765	0.9434
CMedQAv2-reranking	0.9211	0.9256	nan	0.6678	0.9353
CmedqaRetrieval	0.5742	0.5272	nan	0.2866	0.5658
Cmnli	0.9015	0.8773	nan	nan	0.9579
CovidRetrieval	0.9291	0.9194	0.7913	0.7561	0.9606
DuRetrieval	0.9107	0.9198	nan	0.8530	0.9423
EcomRetrieval	0.7328	0.7447	nan	0.5467	0.7881
IFlyTek	0.5273	0.5973	nan	0.4186	0.5799
JDReview	0.9054	0.8923	nan	0.8054	0.9214
LCQMC	0.7997	0.7748	nan	0.7595	0.8354
MMarcoReranking	0.3890	0.4358	nan	0.2912	0.4689
MMarcoRetrieval	0.8957	0.8845	nan	0.7920	0.9033
MedicalRetrieval	0.7324	0.7379	nan	0.5144	0.7562
MultilingualSentiment	0.8089	0.7985	nan	0.7090	0.8536
Ocnli	0.8923	0.8452	nan	nan	0.9518
OnlineShopping	0.9479	0.9413	nan	0.9045	0.9716
PAWSX	0.6782	0.5932	nan	0.1463	0.7331
QBQTC	0.5958	0.5560	nan	nan	0.7145
STSB	0.8484	0.8318	0.8465	0.8108	0.9140
T2Reranking	0.7277	0.7315	0.6795	0.6632	0.7283
T2Retrieval	0.8902	0.8750	nan	0.7607	0.8926
TNews	0.6010	0.6005	nan	0.4880	0.6090
ThuNewsClusteringP2P	0.8698	0.8973	nan	nan	0.8976
ThuNewsClusteringS2S	0.8459	0.8955	nan	nan	0.8790
VideoRetrieval	0.8105	0.8085	nan	0.5828	0.8384
Waimai	0.8980	0.8933	nan	0.8630	0.9231
Average	0.7748	0.7760	0.7725	0.6037	0.8132

Model have high performance on these tasks: CLSClusteringS2S,IFlyTek,T2Reranking,ThuNewsClusteringS2S

KennethEnevoldsen · 2025-10-01T15:07:35Z

How come there is such a big difference - isn't it the same model?

spring-quan · 2025-10-01T16:07:49Z

How come there is such a big difference - isn't it the same model?

Yes, these are two completely different models. The old model outputs vectors with a dimensionality of 2304. We have retrained the new model from scratch, starting with base model pre-training, followed by weakly-supervised training and supervised fine-tuning, achieving better performance. The new model outputs vectors with a dimensionality of 2048.

KennethEnevoldsen · 2025-10-02T15:58:53Z

Hmm, this would probably be seen as misleading to some. It might be better to version it as v2? (I didn't catch this when you changed the models implementation, but probably should have)

spring-quan · 2025-10-03T02:47:12Z

thanks for the feedback! We'd like to clarify that the newly submitted results under the same name represent a significant shift in our approach. we are now releasing a fully open-source model checkpoint for community use and research.

The previous entry was based on a demo API, which was intended as a temporary and will be unmaintained for some time to come. We believe the community will benefit much more from a permanent and open-source model.

Therefore, our intention is for this new open-source version to supersede the old API-based one under the same name, ensuring continuity and providing the best available resource under that identifier. We appreciate your consideration in making this switch!

KennethEnevoldsen · 2025-10-04T11:59:57Z

Happy to hear about the changes approach, I also see that the revision was updated in the model meta - I will merge this in

I would love to have this model on the other benchmarks as well, if you have the time to run the evaluation.

spring-quan · 2025-10-07T15:04:28Z

Happy to hear about the changes approach, I also see that the revision was updated in the model meta - I will merge this in

I would love to have this model on the other benchmarks as well, if you have the time to run the evaluation.

Thank you for confirming the approval @KennethEnevoldsen . May I ask when the model results will be updated on the leaderboard?

Samoed · 2025-10-07T15:07:02Z

They should be updated automatically and now on leaderboard should be new results

spring-quan · 2025-10-07T15:13:50Z

Thanks for letting me know. @KennethEnevoldsen However, I've noticed that the leaderboard results haven't been updated yet. Could you please look into the reason for this? We appreciate your help.

Samoed · 2025-10-07T15:20:08Z

Yes, I can't find your model on leaderboard. I'll try to find what's wrong

Samoed · 2025-10-07T15:25:42Z

@spring-quan In embeddings-benchmark/mteb#3227 you add model with name tencent/Youtu-Embedding, but here model with Youtu-RAG/Youtu-Embedding-V1

Samoed · 2025-10-07T15:26:33Z

results/Youtu-RAG__Youtu-Embedding-V1/32e04afc24817c187a8422e7bdbb493b19796d47/model_meta.json

@@ -0,0 +1 @@
+{"name": "Youtu-RAG/Youtu-Embedding-V1", "revision": "32e04afc24817c187a8422e7bdbb493b19796d47", "release_date": "2025-09-28", "languages": ["zho_Hans"], "n_parameters": 2672957440, "memory_usage_mb": null, "max_tokens": 8192, "embed_dim": 2048, "license": "apache-2.0", "open_weights": true, "public_training_code": null, "public_training_data": null, "framework": ["PyTorch"], "reference": "https://huggingface.co/tencent/Youtu-Embedding", "similarity_fn_name": "cosine", "use_instructions": true, "training_datasets": {"T2Retrieval": ["train"], "DuRetrieval": ["train"], "T2Reranking": ["train"], "MMarcoReranking": ["train"], "CmedqaRetrieval": ["train"], "CMedQAv1-reranking": ["train"], "CMedQAv2-reranking": ["train"], "BQ": ["train"], "LCQMC": ["train"], "PAWSX": ["train"], "STS-B": ["train"], "AFQMC": ["train"], "Cmnli": ["train"], "Ocnli": ["train"]}, "adapted_from": null, "superseded_by": null, "is_cross_encoder": null, "modalities": ["text"], "loader": null}


I think you've renamed model in your fork, but didn't submit it

Thank you for pointing that out. We've corrected the model name and have now resubmitted the results. We kindly request your review.

embeddings-benchmark/mteb#292

update results for youtu embedding model

0b66571

KennethEnevoldsen merged commit a1c8dac into embeddings-benchmark:main Oct 4, 2025
3 checks passed

Samoed reviewed Oct 7, 2025

View reviewed changes

		@@ -0,0 +1 @@
		{"name": "Youtu-RAG/Youtu-Embedding-V1", "revision": "32e04afc24817c187a8422e7bdbb493b19796d47", "release_date": "2025-09-28", "languages": ["zho_Hans"], "n_parameters": 2672957440, "memory_usage_mb": null, "max_tokens": 8192, "embed_dim": 2048, "license": "apache-2.0", "open_weights": true, "public_training_code": null, "public_training_data": null, "framework": ["PyTorch"], "reference": "https://huggingface.co/tencent/Youtu-Embedding", "similarity_fn_name": "cosine", "use_instructions": true, "training_datasets": {"T2Retrieval": ["train"], "DuRetrieval": ["train"], "T2Reranking": ["train"], "MMarcoReranking": ["train"], "CmedqaRetrieval": ["train"], "CMedQAv1-reranking": ["train"], "CMedQAv2-reranking": ["train"], "BQ": ["train"], "LCQMC": ["train"], "PAWSX": ["train"], "STS-B": ["train"], "AFQMC": ["train"], "Cmnli": ["train"], "Ocnli": ["train"]}, "adapted_from": null, "superseded_by": null, "is_cross_encoder": null, "modalities": ["text"], "loader": null}

Conversation

spring-quan commented Sep 30, 2025

Checklist

Uh oh!

github-actions bot commented Sep 30, 2025

Model Results Comparison

Results for Youtu-RAG/Youtu-Embedding-V1

Uh oh!

KennethEnevoldsen commented Oct 1, 2025

Uh oh!

spring-quan commented Oct 1, 2025

Uh oh!

KennethEnevoldsen commented Oct 2, 2025

Uh oh!

spring-quan commented Oct 3, 2025

Uh oh!

KennethEnevoldsen commented Oct 4, 2025

Uh oh!

Uh oh!

spring-quan commented Oct 7, 2025

Uh oh!

Samoed commented Oct 7, 2025

Uh oh!

spring-quan commented Oct 7, 2025

Uh oh!

Samoed commented Oct 7, 2025

Uh oh!

Samoed commented Oct 7, 2025

Uh oh!

Samoed Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

spring-quan Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Results for `Youtu-RAG/Youtu-Embedding-V1`