Skip to content

add results for CoDi-Embedding-V1#258

Closed
spring-quan wants to merge 1 commit intoembeddings-benchmark:mainfrom
spring-quan:codi_embedding_v1
Closed

add results for CoDi-Embedding-V1#258
spring-quan wants to merge 1 commit intoembeddings-benchmark:mainfrom
spring-quan:codi_embedding_v1

Conversation

@spring-quan
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Youtu-RAG/CoDi-Embedding-V1
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for Youtu-RAG/CoDi-Embedding-V1

task_name Youtu-RAG/CoDi-Embedding-V1 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AFQMC 0.7153 nan 0.3301 0.7225
ATEC 0.6225 nan 0.398 0.6464
BQ 0.7263 nan 0.4644 0.8125
CLSClusteringP2P 0.7906 nan nan 0.8225
CLSClusteringS2S 0.7710 nan nan 0.7387
CMedQAv1-reranking 0.8822 nan 0.6765 0.9156
CMedQAv2-reranking 0.8713 nan 0.6672 0.9248
CmedqaRetrieval 0.5475 nan 0.2866 0.5658
Cmnli 0.8952 nan nan 0.9306
CovidRetrieval 0.9323 0.7913 0.7561 0.9606
DuRetrieval 0.8969 nan 0.853 0.9423
EcomRetrieval 0.7138 nan 0.5467 0.7764
IFlyTek 0.5303 nan 0.4186 0.5770
JDReview 0.9109 nan 0.8054 0.9169
LCQMC 0.7967 nan 0.7595 0.8070
MMarcoReranking 0.2874 nan 0.2912 0.4689
MMarcoRetrieval 0.8566 nan 0.792 0.9033
MedicalRetrieval 0.7032 nan 0.5144 0.7562
MultilingualSentiment 0.8094 nan 0.709 0.8263
Ocnli 0.8868 nan nan 0.9215
OnlineShopping 0.9487 nan 0.9045 0.9716
PAWSX 0.5986 nan 0.1463 0.6644
QBQTC 0.5905 nan nan 0.7145
STSB 0.8404 0.8465 0.8108 0.9140
T2Reranking 0.6759 0.6795 0.6632 0.7283
T2Retrieval 0.8768 nan 0.7607 0.8926
TNews 0.5852 nan 0.488 0.5922
ThuNewsClusteringP2P 0.8805 nan nan 0.8879
ThuNewsClusteringS2S 0.9024 nan nan 0.8790
VideoRetrieval 0.7796 nan 0.5828 0.8384
Waimai 0.9047 nan 0.863 0.9174
Average 0.7655 0.7725 0.6037 0.8044

@KennethEnevoldsen
Copy link
Contributor

@spring-quan a few of the values seems surprisingly high given the training data, any chance there might have been leakage (or that you have missed a few of these datasets in the training data annotations?) - can I please ask you to double check these

Also seems like the reference link in the model meta refer to the wrong model (feel free to make a PR to fix it)

@spring-quan
Copy link
Contributor Author

I will check the training data next and close this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants