update results for youtu embedding model#284
update results for youtu embedding model#284KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | Youtu-RAG/Youtu-Embedding-V1 | Youtu-RAG/Youtu-Embedding-V1 | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|---|
| Revisions | 1 | 32e04afc24817c187a8422e7bdbb493b19796d47 | |||
| AFQMC | 0.6711 | 0.7219 | nan | 0.3301 | 0.7225 |
| ATEC | 0.5967 | 0.6170 | nan | 0.3980 | 0.6464 |
| BQ | 0.7295 | 0.7227 | nan | 0.4644 | 0.8125 |
| CLSClusteringP2P | 0.7580 | 0.8153 | nan | nan | 0.8225 |
| CLSClusteringS2S | 0.7131 | 0.7627 | nan | nan | 0.7408 |
| CMedQAv1-reranking | 0.9162 | 0.9109 | nan | 0.6765 | 0.9434 |
| CMedQAv2-reranking | 0.9211 | 0.9256 | nan | 0.6678 | 0.9353 |
| CmedqaRetrieval | 0.5742 | 0.5272 | nan | 0.2866 | 0.5658 |
| Cmnli | 0.9015 | 0.8773 | nan | nan | 0.9579 |
| CovidRetrieval | 0.9291 | 0.9194 | 0.7913 | 0.7561 | 0.9606 |
| DuRetrieval | 0.9107 | 0.9198 | nan | 0.8530 | 0.9423 |
| EcomRetrieval | 0.7328 | 0.7447 | nan | 0.5467 | 0.7881 |
| IFlyTek | 0.5273 | 0.5973 | nan | 0.4186 | 0.5799 |
| JDReview | 0.9054 | 0.8923 | nan | 0.8054 | 0.9214 |
| LCQMC | 0.7997 | 0.7748 | nan | 0.7595 | 0.8354 |
| MMarcoReranking | 0.3890 | 0.4358 | nan | 0.2912 | 0.4689 |
| MMarcoRetrieval | 0.8957 | 0.8845 | nan | 0.7920 | 0.9033 |
| MedicalRetrieval | 0.7324 | 0.7379 | nan | 0.5144 | 0.7562 |
| MultilingualSentiment | 0.8089 | 0.7985 | nan | 0.7090 | 0.8536 |
| Ocnli | 0.8923 | 0.8452 | nan | nan | 0.9518 |
| OnlineShopping | 0.9479 | 0.9413 | nan | 0.9045 | 0.9716 |
| PAWSX | 0.6782 | 0.5932 | nan | 0.1463 | 0.7331 |
| QBQTC | 0.5958 | 0.5560 | nan | nan | 0.7145 |
| STSB | 0.8484 | 0.8318 | 0.8465 | 0.8108 | 0.9140 |
| T2Reranking | 0.7277 | 0.7315 | 0.6795 | 0.6632 | 0.7283 |
| T2Retrieval | 0.8902 | 0.8750 | nan | 0.7607 | 0.8926 |
| TNews | 0.6010 | 0.6005 | nan | 0.4880 | 0.6090 |
| ThuNewsClusteringP2P | 0.8698 | 0.8973 | nan | nan | 0.8976 |
| ThuNewsClusteringS2S | 0.8459 | 0.8955 | nan | nan | 0.8790 |
| VideoRetrieval | 0.8105 | 0.8085 | nan | 0.5828 | 0.8384 |
| Waimai | 0.8980 | 0.8933 | nan | 0.8630 | 0.9231 |
| Average | 0.7748 | 0.7760 | 0.7725 | 0.6037 | 0.8132 |
Model have high performance on these tasks: CLSClusteringS2S,IFlyTek,T2Reranking,ThuNewsClusteringS2S
|
How come there is such a big difference - isn't it the same model? |
Yes, these are two completely different models. The old model outputs vectors with a dimensionality of 2304. We have retrained the new model from scratch, starting with base model pre-training, followed by weakly-supervised training and supervised fine-tuning, achieving better performance. The new model outputs vectors with a dimensionality of 2048. |
|
Hmm, this would probably be seen as misleading to some. It might be better to version it as v2? (I didn't catch this when you changed the models implementation, but probably should have) |
|
thanks for the feedback! We'd like to clarify that the newly submitted results under the same name represent a significant shift in our approach. we are now releasing a fully open-source model checkpoint for community use and research. The previous entry was based on a demo API, which was intended as a temporary and will be unmaintained for some time to come. We believe the community will benefit much more from a permanent and open-source model. Therefore, our intention is for this new open-source version to supersede the old API-based one under the same name, ensuring continuity and providing the best available resource under that identifier. We appreciate your consideration in making this switch! |
|
Happy to hear about the changes approach, I also see that the revision was updated in the model meta - I will merge this in I would love to have this model on the other benchmarks as well, if you have the time to run the evaluation. |
Thank you for confirming the approval @KennethEnevoldsen . May I ask when the model results will be updated on the leaderboard? |
|
They should be updated automatically and now on leaderboard should be new results |
|
Thanks for letting me know. @KennethEnevoldsen However, I've noticed that the leaderboard results haven't been updated yet. Could you please look into the reason for this? We appreciate your help. |
|
Yes, I can't find your model on leaderboard. I'll try to find what's wrong |
|
@spring-quan In embeddings-benchmark/mteb#3227 you add model with name |
| @@ -0,0 +1 @@ | |||
| {"name": "Youtu-RAG/Youtu-Embedding-V1", "revision": "32e04afc24817c187a8422e7bdbb493b19796d47", "release_date": "2025-09-28", "languages": ["zho_Hans"], "n_parameters": 2672957440, "memory_usage_mb": null, "max_tokens": 8192, "embed_dim": 2048, "license": "apache-2.0", "open_weights": true, "public_training_code": null, "public_training_data": null, "framework": ["PyTorch"], "reference": "https://huggingface.co/tencent/Youtu-Embedding", "similarity_fn_name": "cosine", "use_instructions": true, "training_datasets": {"T2Retrieval": ["train"], "DuRetrieval": ["train"], "T2Reranking": ["train"], "MMarcoReranking": ["train"], "CmedqaRetrieval": ["train"], "CMedQAv1-reranking": ["train"], "CMedQAv2-reranking": ["train"], "BQ": ["train"], "LCQMC": ["train"], "PAWSX": ["train"], "STS-B": ["train"], "AFQMC": ["train"], "Cmnli": ["train"], "Ocnli": ["train"]}, "adapted_from": null, "superseded_by": null, "is_cross_encoder": null, "modalities": ["text"], "loader": null} | |||
There was a problem hiding this comment.
I think you've renamed model in your fork, but didn't submit it
There was a problem hiding this comment.
Thank you for pointing that out. We've corrected the model name and have now resubmitted the results. We kindly request your review.
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found here