Skip to content

update results for youtu embedding model#284

Merged
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
spring-quan:youtu_llm_embedding
Oct 4, 2025
Merged

update results for youtu embedding model#284
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
spring-quan:youtu_llm_embedding

Conversation

@spring-quan
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Youtu-RAG/Youtu-Embedding-V1
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for Youtu-RAG/Youtu-Embedding-V1

task_name Youtu-RAG/Youtu-Embedding-V1 Youtu-RAG/Youtu-Embedding-V1 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
Revisions 1 32e04afc24817c187a8422e7bdbb493b19796d47
AFQMC 0.6711 0.7219 nan 0.3301 0.7225
ATEC 0.5967 0.6170 nan 0.3980 0.6464
BQ 0.7295 0.7227 nan 0.4644 0.8125
CLSClusteringP2P 0.7580 0.8153 nan nan 0.8225
CLSClusteringS2S 0.7131 0.7627 nan nan 0.7408
CMedQAv1-reranking 0.9162 0.9109 nan 0.6765 0.9434
CMedQAv2-reranking 0.9211 0.9256 nan 0.6678 0.9353
CmedqaRetrieval 0.5742 0.5272 nan 0.2866 0.5658
Cmnli 0.9015 0.8773 nan nan 0.9579
CovidRetrieval 0.9291 0.9194 0.7913 0.7561 0.9606
DuRetrieval 0.9107 0.9198 nan 0.8530 0.9423
EcomRetrieval 0.7328 0.7447 nan 0.5467 0.7881
IFlyTek 0.5273 0.5973 nan 0.4186 0.5799
JDReview 0.9054 0.8923 nan 0.8054 0.9214
LCQMC 0.7997 0.7748 nan 0.7595 0.8354
MMarcoReranking 0.3890 0.4358 nan 0.2912 0.4689
MMarcoRetrieval 0.8957 0.8845 nan 0.7920 0.9033
MedicalRetrieval 0.7324 0.7379 nan 0.5144 0.7562
MultilingualSentiment 0.8089 0.7985 nan 0.7090 0.8536
Ocnli 0.8923 0.8452 nan nan 0.9518
OnlineShopping 0.9479 0.9413 nan 0.9045 0.9716
PAWSX 0.6782 0.5932 nan 0.1463 0.7331
QBQTC 0.5958 0.5560 nan nan 0.7145
STSB 0.8484 0.8318 0.8465 0.8108 0.9140
T2Reranking 0.7277 0.7315 0.6795 0.6632 0.7283
T2Retrieval 0.8902 0.8750 nan 0.7607 0.8926
TNews 0.6010 0.6005 nan 0.4880 0.6090
ThuNewsClusteringP2P 0.8698 0.8973 nan nan 0.8976
ThuNewsClusteringS2S 0.8459 0.8955 nan nan 0.8790
VideoRetrieval 0.8105 0.8085 nan 0.5828 0.8384
Waimai 0.8980 0.8933 nan 0.8630 0.9231
Average 0.7748 0.7760 0.7725 0.6037 0.8132

Model have high performance on these tasks: CLSClusteringS2S,IFlyTek,T2Reranking,ThuNewsClusteringS2S


@KennethEnevoldsen
Copy link
Contributor

How come there is such a big difference - isn't it the same model?

@spring-quan
Copy link
Contributor Author

How come there is such a big difference - isn't it the same model?

Yes, these are two completely different models. The old model outputs vectors with a dimensionality of 2304. We have retrained the new model from scratch, starting with base model pre-training, followed by weakly-supervised training and supervised fine-tuning, achieving better performance. The new model outputs vectors with a dimensionality of 2048.

@KennethEnevoldsen
Copy link
Contributor

Hmm, this would probably be seen as misleading to some. It might be better to version it as v2? (I didn't catch this when you changed the models implementation, but probably should have)

@spring-quan
Copy link
Contributor Author

thanks for the feedback! We'd like to clarify that the newly submitted results under the same name represent a significant shift in our approach. we are now releasing a fully open-source model checkpoint for community use and research.

The previous entry was based on a demo API, which was intended as a temporary and will be unmaintained for some time to come. We believe the community will benefit much more from a permanent and open-source model.

Therefore, our intention is for this new open-source version to supersede the old API-based one under the same name, ensuring continuity and providing the best available resource under that identifier. We appreciate your consideration in making this switch!

@KennethEnevoldsen
Copy link
Contributor

Happy to hear about the changes approach, I also see that the revision was updated in the model meta - I will merge this in

I would love to have this model on the other benchmarks as well, if you have the time to run the evaluation.

@KennethEnevoldsen KennethEnevoldsen merged commit a1c8dac into embeddings-benchmark:main Oct 4, 2025
3 checks passed
@spring-quan
Copy link
Contributor Author

Happy to hear about the changes approach, I also see that the revision was updated in the model meta - I will merge this in

I would love to have this model on the other benchmarks as well, if you have the time to run the evaluation.

Thank you for confirming the approval @KennethEnevoldsen . May I ask when the model results will be updated on the leaderboard?

@Samoed
Copy link
Member

Samoed commented Oct 7, 2025

They should be updated automatically and now on leaderboard should be new results

@spring-quan
Copy link
Contributor Author

Thanks for letting me know. @KennethEnevoldsen However, I've noticed that the leaderboard results haven't been updated yet. Could you please look into the reason for this? We appreciate your help.

@Samoed
Copy link
Member

Samoed commented Oct 7, 2025

Yes, I can't find your model on leaderboard. I'll try to find what's wrong

@Samoed
Copy link
Member

Samoed commented Oct 7, 2025

@spring-quan In embeddings-benchmark/mteb#3227 you add model with name tencent/Youtu-Embedding, but here model with Youtu-RAG/Youtu-Embedding-V1

@@ -0,0 +1 @@
{"name": "Youtu-RAG/Youtu-Embedding-V1", "revision": "32e04afc24817c187a8422e7bdbb493b19796d47", "release_date": "2025-09-28", "languages": ["zho_Hans"], "n_parameters": 2672957440, "memory_usage_mb": null, "max_tokens": 8192, "embed_dim": 2048, "license": "apache-2.0", "open_weights": true, "public_training_code": null, "public_training_data": null, "framework": ["PyTorch"], "reference": "https://huggingface.co/tencent/Youtu-Embedding", "similarity_fn_name": "cosine", "use_instructions": true, "training_datasets": {"T2Retrieval": ["train"], "DuRetrieval": ["train"], "T2Reranking": ["train"], "MMarcoReranking": ["train"], "CmedqaRetrieval": ["train"], "CMedQAv1-reranking": ["train"], "CMedQAv2-reranking": ["train"], "BQ": ["train"], "LCQMC": ["train"], "PAWSX": ["train"], "STS-B": ["train"], "AFQMC": ["train"], "Cmnli": ["train"], "Ocnli": ["train"]}, "adapted_from": null, "superseded_by": null, "is_cross_encoder": null, "modalities": ["text"], "loader": null}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've renamed model in your fork, but didn't submit it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. We've corrected the model name and have now resubmitted the results. We kindly request your review.

embeddings-benchmark/mteb#292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants