Skip to content

Add results for Kingsoft-LLM/QZhou-Embedding-Zh#282

Merged
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
PennyYu123:add-new-results
Sep 30, 2025
Merged

Add results for Kingsoft-LLM/QZhou-Embedding-Zh#282
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
PennyYu123:add-new-results

Conversation

@PennyYu123
Copy link
Contributor

@PennyYu123 PennyYu123 commented Sep 29, 2025

Checklist

  • ✅ My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
  • ✅ The results submitted is obtained using the reference implementation
  • ✅ My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • ✅ I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding-Zh
Tasks: AFQMC, ATEC, BQ, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, IFlyTek, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MedicalRetrieval, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, STSB, T2Reranking, T2Retrieval, TNews, ThuNewsClusteringP2P, ThuNewsClusteringS2S, VideoRetrieval, Waimai

Results for Kingsoft-LLM/QZhou-Embedding-Zh

task_name Kingsoft-LLM/QZhou-Embedding-Zh google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AFQMC 0.6679 nan 0.3301 0.7225
ATEC 0.5772 nan 0.3981 0.6464
BQ 0.8092 nan 0.485 0.8125
CLSClusteringP2P 0.7763 nan nan 0.8225
CLSClusteringS2S 0.7408 nan nan 0.7387
CMedQAv1-reranking 0.9254 nan 0.6765 0.9434
CMedQAv2-reranking 0.9256 nan 0.6678 0.9353
CmedqaRetrieval 0.5233 nan 0.2866 0.5742
Cmnli 0.9579 nan nan 0.9501
CovidRetrieval 0.9346 0.7913 0.7561 0.9606
DuRetrieval 0.9282 nan 0.853 0.9423
EcomRetrieval 0.7881 nan 0.5467 0.7764
IFlyTek 0.5745 nan 0.4186 0.5799
JDReview 0.9214 nan 0.8054 0.9169
LCQMC 0.8354 nan 0.7595 0.8240
MMarcoReranking 0.3613 nan 0.2912 0.4689
MMarcoRetrieval 0.8625 nan 0.792 0.9033
MedicalRetrieval 0.7280 nan 0.5144 0.7562
MultilingualSentiment 0.7563 nan 0.709 0.8536
Ocnli 0.9518 nan nan 0.9513
OnlineShopping 0.9500 nan 0.9045 0.9716
PAWSX 0.7331 nan 0.1463 0.7009
QBQTC 0.6156 nan nan 0.7145
STSB 0.8998 0.855 0.8236 0.9199
T2Reranking 0.6827 0.6795 0.6632 0.7283
T2Retrieval 0.8798 nan 0.7607 0.8926
TNews 0.5923 nan 0.488 0.6090
ThuNewsClusteringP2P 0.8976 nan nan 0.8879
ThuNewsClusteringS2S 0.8559 nan nan 0.8790
VideoRetrieval 0.7751 nan 0.5828 0.8384
Waimai 0.9206 nan 0.863 0.9231
Average 0.7854 0.7753 0.6051 0.8111

Model have high performance on these tasks: CLSClusteringS2S,Cmnli,EcomRetrieval,JDReview,LCQMC,Ocnli,PAWSX,ThuNewsClusteringP2P


@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Sep 29, 2025

@PennyYu123, I couldn't find the reference implementation in MTEB - Am I missing something?

@Samoed
Copy link
Member

Samoed commented Sep 29, 2025

@PennyYu123
Copy link
Contributor Author

@PennyYu123, I couldn't find the reference implementation in MTEB - Am I missing something?

As your colleague mentioned.
embeddings-benchmark/mteb#3211

@KennethEnevoldsen
Copy link
Contributor

Ahh sorry it was simply because the checklist suggested that it was already there - I have fixed it

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Sep 29, 2025
@KennethEnevoldsen
Copy link
Contributor

Nothing here seems especially problematic to me.

@PennyYu123 can I just get a confirmation that you did not train on any of:

CLSClusteringS2S, Cmnli, EcomRetrieval, JDReview, Ocnli, ThuNewsClusteringP2P

@PennyYu123
Copy link
Contributor Author

We have verified that the mentioned datasets were not incorporated.

@KennethEnevoldsen KennethEnevoldsen merged commit 897537b into embeddings-benchmark:main Sep 30, 2025
4 checks passed
@PennyYu123
Copy link
Contributor Author

Wow, thank you for your thorough review @KennethEnevoldsen @Samoed , and it's been another pleasant collaboration! We will continue to bring better contributions to the community!🤝

@PennyYu123 PennyYu123 deleted the add-new-results branch September 30, 2025 09:26
@KennethEnevoldsen
Copy link
Contributor

Thanks @PennyYu123! Looking forward to some great PRs from you guys :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants