Skip to content

update youtu embedding model metadata#292

Merged
Samoed merged 5 commits intoembeddings-benchmark:mainfrom
spring-quan:youtu_llm_embedding
Oct 7, 2025
Merged

update youtu embedding model metadata#292
Samoed merged 5 commits intoembeddings-benchmark:mainfrom
spring-quan:youtu_llm_embedding

Conversation

@spring-quan
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: tencent/Youtu-Embedding, tencent/Youtu-Embedding
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for tencent/Youtu-Embedding

task_name google/gemini-embedding-001 tencent/Youtu-Embedding tencent/Youtu-Embedding intfloat/multilingual-e5-large Max result
Revisions 1 32e04afc24817c187a8422e7bdbb493b19796d47
AFQMC nan 0.6711 0.7219 0.3301 0.7225
ATEC nan 0.5989 0.6170 0.3981 0.6464
BQ nan 0.7401 0.7227 0.4850 0.8125
CLSClusteringP2P nan 0.7580 0.8153 nan 0.8225
CLSClusteringS2S nan 0.7131 0.7627 nan 0.7408
CMedQAv1-reranking nan 0.9162 0.9109 0.6765 0.9434
CMedQAv2-reranking nan 0.9211 0.9256 0.6678 0.9353
CmedqaRetrieval nan 0.5742 0.5272 0.2866 0.5658
Cmnli nan 0.9015 0.8773 nan 0.9579
CovidRetrieval 0.7913 0.9291 0.9194 0.7561 0.9606
DuRetrieval nan 0.9107 0.9198 0.8530 0.9423
EcomRetrieval nan 0.7328 0.7447 0.5467 0.7881
IFlyTek nan 0.5273 0.5973 0.4186 0.5799
JDReview nan 0.9054 0.8923 0.8054 0.9214
LCQMC nan 0.7997 0.7748 0.7595 0.8354
MMarcoReranking nan 0.3890 0.4358 0.2912 0.4689
MMarcoRetrieval nan 0.8957 0.8845 0.7920 0.9033
MedicalRetrieval nan 0.7324 0.7379 0.5144 0.7562
MultilingualSentiment nan 0.8089 0.7985 0.7090 0.8536
Ocnli nan 0.8923 0.8452 nan 0.9518
OnlineShopping nan 0.9479 0.9413 0.9045 0.9716
PAWSX nan 0.6782 0.5932 0.1463 0.7331
QBQTC nan 0.5958 0.5560 nan 0.7145
STSB 0.8550 0.8576 0.8318 0.8236 0.9199
T2Reranking 0.6795 0.7277 0.7315 0.6632 0.7283
T2Retrieval nan 0.8902 0.8750 0.7607 0.8926
TNews nan 0.6010 0.6005 0.4880 0.6090
ThuNewsClusteringP2P nan 0.8698 0.8973 nan 0.8976
ThuNewsClusteringS2S nan 0.8459 0.8955 nan 0.8790
VideoRetrieval nan 0.8105 0.8085 0.5828 0.8384
Waimai nan 0.8980 0.8933 0.8630 0.9231
Average 0.7753 0.7755 0.7760 0.6051 0.8134

Model have high performance on these tasks: CmedqaRetrieval


Results for tencent/Youtu-Embedding

task_name google/gemini-embedding-001 tencent/Youtu-Embedding tencent/Youtu-Embedding intfloat/multilingual-e5-large Max result
Revisions 1 32e04afc24817c187a8422e7bdbb493b19796d47
AFQMC nan 0.6711 0.7219 0.3301 0.7225
ATEC nan 0.5967 0.6170 0.3980 0.6464
BQ nan 0.7295 0.7227 0.4644 0.8125
CLSClusteringP2P nan 0.7580 0.8153 nan 0.8225
CLSClusteringS2S nan 0.7131 0.7627 nan 0.7408
CMedQAv1-reranking nan 0.9162 0.9109 0.6765 0.9434
CMedQAv2-reranking nan 0.9211 0.9256 0.6678 0.9353
CmedqaRetrieval nan 0.5742 0.5272 0.2866 0.5658
Cmnli nan 0.9015 0.8773 nan 0.9579
CovidRetrieval 0.7913 0.9291 0.9194 0.7561 0.9606
DuRetrieval nan 0.9107 0.9198 0.8530 0.9423
EcomRetrieval nan 0.7328 0.7447 0.5467 0.7881
IFlyTek nan 0.5273 0.5973 0.4186 0.5799
JDReview nan 0.9054 0.8923 0.8054 0.9214
LCQMC nan 0.7997 0.7748 0.7595 0.8354
MMarcoReranking nan 0.3890 0.4358 0.2912 0.4689
MMarcoRetrieval nan 0.8957 0.8845 0.7920 0.9033
MedicalRetrieval nan 0.7324 0.7379 0.5144 0.7562
MultilingualSentiment nan 0.8089 0.7985 0.7090 0.8536
Ocnli nan 0.8923 0.8452 nan 0.9518
OnlineShopping nan 0.9479 0.9413 0.9045 0.9716
PAWSX nan 0.6782 0.5932 0.1463 0.7331
QBQTC nan 0.5958 0.5560 nan 0.7145
STSB 0.8465 0.8484 0.8318 0.8108 0.9140
T2Reranking 0.6795 0.7277 0.7315 0.6632 0.7283
T2Retrieval nan 0.8902 0.8750 0.7607 0.8926
TNews nan 0.6010 0.6005 0.4880 0.6090
ThuNewsClusteringP2P nan 0.8698 0.8973 nan 0.8976
ThuNewsClusteringS2S nan 0.8459 0.8955 nan 0.8790
VideoRetrieval nan 0.8105 0.8085 0.5828 0.8384
Waimai nan 0.8980 0.8933 0.8630 0.9231
Average 0.7725 0.7748 0.7760 0.6037 0.8132

Model have high performance on these tasks: CLSClusteringS2S,IFlyTek,T2Reranking,ThuNewsClusteringS2S


@Samoed
Copy link
Member

Samoed commented Oct 7, 2025

@spring-quan You need to rename folder too

@Samoed Samoed enabled auto-merge (squash) October 7, 2025 16:28
@Samoed Samoed merged commit c2427a0 into embeddings-benchmark:main Oct 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants