Skip to content

add results for Youtu-Embedding-V1#265

Merged
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
spring-quan:youtu_embedding
Sep 6, 2025
Merged

add results for Youtu-Embedding-V1#265
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
spring-quan:youtu_embedding

Conversation

@spring-quan
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Youtu-RAG/Youtu-Embedding-V1
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for Youtu-RAG/Youtu-Embedding-V1

task_name Youtu-RAG/Youtu-Embedding-V1 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AFQMC 0.6711 nan 0.3301 0.7225
ATEC 0.5989 nan 0.3981 0.6464
BQ 0.7401 nan 0.485 0.8125
CLSClusteringP2P 0.7580 nan nan 0.8225
CLSClusteringS2S 0.7131 nan nan 0.7387
CMedQAv1-reranking 0.9162 nan 0.6765 0.9434
CMedQAv2-reranking 0.9211 nan 0.6672 0.9353
CmedqaRetrieval 0.5742 nan 0.2866 0.5658
Cmnli 0.9015 nan nan 0.9501
CovidRetrieval 0.9291 0.7913 0.7561 0.9606
DuRetrieval 0.9107 nan 0.853 0.9423
EcomRetrieval 0.7328 nan 0.5467 0.7764
IFlyTek 0.5273 nan 0.4186 0.5799
JDReview 0.9054 nan 0.8054 0.9169
LCQMC 0.7997 nan 0.7595 0.8240
MMarcoReranking 0.3890 nan 0.2912 0.4689
MMarcoRetrieval 0.8957 nan 0.792 0.9033
MedicalRetrieval 0.7324 nan 0.5144 0.7562
MultilingualSentiment 0.8089 nan 0.709 0.8536
Ocnli 0.8923 nan nan 0.9513
OnlineShopping 0.9479 nan 0.9045 0.9716
PAWSX 0.6782 nan 0.1463 0.7009
QBQTC 0.5958 nan nan 0.7145
STSB 0.8576 0.855 0.8236 0.9199
T2Reranking 0.7277 0.6795 0.6632 0.7283
T2Retrieval 0.8902 nan 0.7607 0.8926
TNews 0.6010 nan 0.488 0.6090
ThuNewsClusteringP2P 0.8698 nan nan 0.8879
ThuNewsClusteringS2S 0.8459 nan nan 0.8790
VideoRetrieval 0.8105 nan 0.5828 0.8384
Waimai 0.8980 nan 0.863 0.9231
Average 0.7755 0.7753 0.6051 0.8108

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Sep 2, 2025
@spring-quan
Copy link
Contributor Author

The model implementation has been added to mteb/models/ . Would it be possible to proceed with the next steps?

@spring-quan
Copy link
Contributor Author

@KennethEnevoldsen Could you please kindly provide an update on the current progress?

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Sep 6, 2025

Yes indeed! Thanks for the ping

I don't see anything too problematic in the scores given that the training data annotations

@KennethEnevoldsen KennethEnevoldsen removed the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Sep 6, 2025
@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) September 6, 2025 10:59
@KennethEnevoldsen KennethEnevoldsen merged commit 567774d into embeddings-benchmark:main Sep 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants