kalm-emb-v2.5 results#303
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5 | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| AFQMC | 0.4878 | nan | 0.3301 | 0.7225 |
| ATEC | 0.5273 | nan | 0.3981 | 0.6464 |
| AmazonCounterfactualClassification | 0.9548 | 0.9289 | 0.7974 | 0.9696 |
| AmazonPolarityClassification | 0.9703 | nan | 0.9349 | 0.9774 |
| AmazonReviewsClassification | 0.6415 | nan | 0.492 | 0.6880 |
| ArguAna | 0.6015 | 0.8644 | 0.5438 | 0.8979 |
| ArxivClusteringP2P | 0.5211 | nan | 0.4431 | 0.6092 |
| ArxivClusteringS2S | 0.451 | nan | 0.3843 | 0.5520 |
| AskUbuntuDupQuestions | 0.6239 | 0.6424 | 0.6028 | 0.7020 |
| BIOSSES | 0.8402 | 0.8897 | 0.8457 | 0.9692 |
| BQ | 0.7114 | nan | 0.485 | 0.8125 |
| Banking77Classification | 0.9031 | 0.9427 | 0.8473 | 0.9427 |
| BiorxivClusteringP2P | 0.4851 | nan | 0.355 | 0.5522 |
| BiorxivClusteringS2S | 0.4275 | nan | 0.335 | 0.5093 |
| CLSClusteringP2P | 0.6625 | nan | nan | 0.8225 |
| CLSClusteringS2S | 0.6273 | nan | nan | 0.7627 |
| CMedQAv1-reranking | 0.8458 | nan | 0.6765 | 0.9434 |
| CMedQAv2-reranking | 0.8578 | nan | 0.6678 | 0.9353 |
| CQADupstackAndroidRetrieval | 0.5714 | nan | 0.4904 | 0.7426 |
| CQADupstackEnglishRetrieval | 0.5213 | nan | 0.4581 | 0.6998 |
| CQADupstackGamingRetrieval | 0.6552 | 0.7068 | 0.587 | 0.7861 |
| CQADupstackGisRetrieval | 0.453 | nan | 0.3695 | 0.6340 |
| CQADupstackMathematicaRetrieval | 0.3606 | nan | 0.2818 | 0.6948 |
| CQADupstackPhysicsRetrieval | 0.5168 | nan | 0.4366 | 0.7371 |
| CQADupstackProgrammersRetrieval | 0.4925 | nan | 0.416 | 0.6587 |
| CQADupstackStatsRetrieval | 0.405 | nan | 0.3238 | 0.6242 |
| CQADupstackTexRetrieval | 0.3523 | nan | 0.2836 | 0.6295 |
| CQADupstackUnixRetrieval | 0.4887 | 0.5369 | 0.3988 | 0.7198 |
| CQADupstackWebmastersRetrieval | 0.4711 | nan | 0.3988 | 0.6835 |
| CQADupstackWordpressRetrieval | 0.3757 | nan | 0.3164 | 0.5862 |
| ClimateFEVER | 0.345 | nan | 0.2573 | 0.5693 |
| CmedqaRetrieval | 0.4587 | nan | 0.2866 | 0.5658 |
| Cmnli | 0.861 | nan | nan | 0.9579 |
| CovidRetrieval | 0.8357 | 0.7913 | 0.7561 | 0.9606 |
| DBPedia | 0.4262 | nan | 0.413 | 0.5350 |
| DuRetrieval | 0.8614 | nan | 0.853 | 0.9423 |
| EcomRetrieval | 0.6668 | nan | 0.5467 | 0.7881 |
| EmotionClassification | 0.838 | nan | 0.4758 | 0.9387 |
| FEVER | 0.8789 | nan | 0.8281 | 0.9628 |
| FiQA2018 | 0.471 | 0.6178 | 0.4381 | 0.8206 |
| HotpotQA | 0.7176 | nan | 0.7123 | 0.8758 |
| IFlyTek | 0.5659 | nan | 0.4186 | 0.5973 |
| ImdbClassification | 0.9591 | 0.9498 | 0.9023 | 0.9737 |
| JDReview | 0.8882 | nan | 0.8054 | 0.9214 |
| LCQMC | 0.775 | nan | 0.7595 | 0.8354 |
| MMarcoReranking | 0.2964 | nan | 0.2912 | 0.4689 |
| MMarcoRetrieval | 0.8223 | nan | 0.792 | 0.9033 |
| MSMARCO | 0.4062 | nan | 0.437 | 0.4812 |
| MTOPDomainClassification | 0.9869 | 0.9927 | 0.9367 | 0.9995 |
| MTOPIntentClassification | 0.911 | nan | 0.779 | 0.9551 |
| MassiveIntentClassification | 0.8324 | 0.8846 | 0.7376 | 0.9194 |
| MassiveScenarioClassification | 0.8935 | 0.9208 | 0.7751 | 0.9930 |
| MedicalRetrieval | 0.6046 | nan | 0.5144 | 0.7562 |
| MedrxivClusteringP2P | 0.4309 | nan | 0.317 | 0.5153 |
| MedrxivClusteringS2S | 0.4043 | nan | 0.2976 | 0.4969 |
| MindSmallReranking | 0.3245 | 0.3295 | 0.3142 | 0.3437 |
| MultilingualSentiment | 0.8057 | nan | 0.709 | 0.8536 |
| NFCorpus | 0.3711 | nan | 0.3399 | 0.5575 |
| NQ | 0.5861 | nan | 0.6406 | 0.8248 |
| Ocnli | 0.8212 | nan | nan | 0.9518 |
| OnlineShopping | 0.9502 | nan | 0.9045 | 0.9716 |
| PAWSX | 0.479 | nan | 0.1463 | 0.7331 |
| QBQTC | 0.3983 | nan | nan | 0.7145 |
| QuoraRetrieval | 0.8957 | nan | 0.8926 | 0.9235 |
| RedditClustering | 0.7689 | nan | 0.4691 | 0.7716 |
| RedditClusteringP2P | 0.7284 | nan | 0.6322 | 0.7527 |
| SCIDOCS | 0.2162 | 0.2515 | 0.1747 | 0.3453 |
| SICK-R | 0.832 | 0.8275 | 0.8023 | 0.9465 |
| STS12 | 0.819 | 0.8155 | 0.8002 | 0.9546 |
| STS13 | 0.8952 | 0.8989 | 0.8155 | 0.9776 |
| STS14 | 0.8599 | 0.8541 | 0.7772 | 0.9753 |
| STS15 | 0.9033 | 0.9044 | 0.8931 | 0.9811 |
| STS16 | 0.8774 | nan | 0.8579 | 0.9763 |
| STS17 | 0.8135 | 0.8887 | 0.8209 | 0.9323 |
| STS22 | 0.7136 | nan | 0.6485 | 0.7743 |
| STSB | 0.829 | 0.8550 | 0.8236 | 0.9199 |
| STSBenchmark | 0.8888 | 0.8908 | 0.8729 | 0.9504 |
| SciDocsRR | 0.8468 | nan | 0.8422 | 0.9114 |
| SciFact | 0.7438 | nan | 0.7042 | 0.8660 |
| SprintDuplicateQuestions | 0.9609 | 0.9690 | 0.9318 | 0.9838 |
| StackExchangeClustering | 0.8022 | nan | 0.5837 | 0.8395 |
| StackExchangeClusteringP2P | 0.4726 | nan | 0.329 | 0.5157 |
| StackOverflowDupQuestions | 0.5182 | nan | 0.5014 | 0.6292 |
| SummEval | 0.3118 | nan | 0.2969 | 0.4052 |
| T2Reranking | 0.676 | 0.6795 | 0.6632 | 0.7315 |
| T2Retrieval | 0.8597 | nan | 0.7607 | 0.8926 |
| TNews | 0.5327 | nan | 0.488 | 0.6090 |
| TRECCOVID | 0.8298 | 0.8631 | 0.7133 | 0.9499 |
| ThuNewsClusteringP2P | 0.8464 | nan | nan | 0.8976 |
| ThuNewsClusteringS2S | 0.7875 | nan | nan | 0.8955 |
| Touche2020 | 0.2893 | nan | 0.2339 | 0.3939 |
| ToxicConversationsClassification | 0.917 | 0.8875 | 0.7132 | 0.9759 |
| TweetSentimentExtractionClassification | 0.8008 | 0.6988 | 0.628 | 0.8823 |
| TwentyNewsgroupsClustering | 0.7326 | nan | 0.394 | 0.8349 |
| TwitterSemEval2015 | 0.7715 | 0.7917 | 0.7548 | 0.8946 |
| TwitterURLCorpus | 0.8666 | 0.8705 | 0.8589 | 0.9571 |
| VideoRetrieval | 0.7644 | nan | 0.5828 | 0.8384 |
| Waimai | 0.8991 | nan | 0.863 | 0.9231 |
| Average | 0.6729 | 0.7982 | 0.5869 | 0.7847 |
...mbedding-multilingual-mini-instruct-v2.5/6a4cfc1084cb459ebd4729b53a8656a61448c720/AFQMC.json
Show resolved
Hide resolved
|
Hi @Samoed |
|
It looks good for me. I requested review for Kenneth |
|
Looks good to me too! |
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found here