Conversation
Model Results ComparisonReference models: Results for
|
| task_name | HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v2 | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| AFQMC | 0.44 | nan | 0.33 | 0.72 |
| ATEC | 0.5 | nan | 0.4 | 0.65 |
| AmazonCounterfactualClassification | 0.96 | 0.88 | 0.7 | 0.97 |
| AmazonPolarityClassification | 0.97 | nan | 0.93 | 0.98 |
| AmazonReviewsClassification | 0.58 | nan | 0.43 | 0.65 |
| ArguAna | 0.57 | 0.86 | 0.54 | 0.90 |
| ArxivClusteringP2P | 0.51 | nan | 0.44 | 0.61 |
| ArxivClusteringS2S | 0.44 | nan | 0.38 | 0.55 |
| AskUbuntuDupQuestions | 0.62 | 0.64 | 0.59 | 0.70 |
| BIOSSES | 0.84 | 0.89 | 0.85 | 0.97 |
| BQ | 0.63 | nan | 0.48 | 0.81 |
| Banking77Classification | 0.89 | 0.94 | 0.75 | 0.94 |
| BiorxivClusteringP2P | 0.48 | nan | 0.36 | 0.55 |
| BiorxivClusteringS2S | 0.42 | nan | 0.33 | 0.51 |
| CLSClusteringP2P | 0.63 | nan | nan | 0.82 |
| CLSClusteringS2S | 0.59 | nan | nan | 0.74 |
| CMedQAv1-reranking | 0.84 | nan | 0.68 | 0.92 |
| CMedQAv2-reranking | 0.84 | nan | 0.67 | 0.92 |
| CQADupstackAndroidRetrieval | 0.54 | nan | 0.49 | 0.74 |
| CQADupstackEnglishRetrieval | 0.49 | nan | 0.46 | 0.70 |
| CQADupstackGamingRetrieval | 0.62 | 0.71 | 0.59 | 0.79 |
| CQADupstackGisRetrieval | 0.43 | nan | 0.37 | 0.63 |
| CQADupstackMathematicaRetrieval | 0.33 | nan | 0.28 | 0.69 |
| CQADupstackPhysicsRetrieval | 0.48 | nan | 0.44 | 0.74 |
| CQADupstackProgrammersRetrieval | 0.45 | nan | 0.42 | 0.66 |
| CQADupstackStatsRetrieval | 0.37 | nan | 0.32 | 0.62 |
| CQADupstackTexRetrieval | 0.33 | nan | 0.28 | 0.63 |
| CQADupstackUnixRetrieval | 0.46 | 0.54 | 0.4 | 0.72 |
| CQADupstackWebmastersRetrieval | 0.43 | nan | 0.4 | 0.68 |
| CQADupstackWordpressRetrieval | 0.36 | nan | 0.32 | 0.59 |
| ClimateFEVER | 0.25 | nan | 0.26 | 0.57 |
| CmedqaRetrieval | 0.45 | nan | 0.29 | 0.57 |
| Cmnli | 0.78 | nan | nan | 0.93 |
| CovidRetrieval | 0.83 | 0.79 | 0.76 | 0.96 |
| DBPedia | 0.4 | nan | 0.41 | 0.53 |
| DuRetrieval | 0.83 | nan | 0.85 | 0.94 |
| EcomRetrieval | 0.65 | nan | 0.55 | 0.78 |
| EmotionClassification | 0.92 | nan | 0.48 | 0.94 |
| FEVER | 0.83 | nan | 0.83 | 0.96 |
| FiQA2018 | 0.45 | 0.62 | 0.44 | 0.80 |
| HotpotQA | 0.7 | nan | 0.71 | 0.88 |
| IFlyTek | 0.51 | nan | 0.42 | 0.58 |
| ImdbClassification | 0.95 | 0.95 | 0.89 | 0.97 |
| JDReview | 0.87 | nan | 0.81 | 0.92 |
| LCQMC | 0.74 | nan | 0.76 | 0.81 |
| MMarcoReranking | 0.26 | nan | 0.29 | 0.47 |
| MMarcoRetrieval | 0.81 | nan | 0.79 | 0.90 |
| MSMARCO | 0.36 | nan | 0.44 | 0.48 |
| MTOPDomainClassification | 0.99 | 0.98 | 0.9 | 1.00 |
| MTOPIntentClassification | 0.89 | nan | 0.67 | 0.95 |
| MassiveIntentClassification | 0.78 | 0.82 | 0.6 | 0.92 |
| MassiveScenarioClassification | 0.86 | 0.87 | 0.7 | 0.99 |
| MedicalRetrieval | 0.6 | nan | 0.51 | 0.76 |
| MedrxivClusteringP2P | 0.44 | nan | 0.32 | 0.52 |
| MedrxivClusteringS2S | 0.41 | nan | 0.3 | 0.50 |
| MindSmallReranking | 0.32 | 0.33 | 0.3 | 0.34 |
| MultilingualSentiment | 0.78 | nan | 0.71 | 0.83 |
| NFCorpus | 0.35 | nan | 0.34 | 0.56 |
| NQ | 0.48 | nan | 0.64 | 0.82 |
| Ocnli | 0.78 | nan | nan | 0.92 |
| OnlineShopping | 0.94 | nan | 0.9 | 0.97 |
| PAWSX | 0.43 | nan | 0.15 | 0.66 |
| QBQTC | 0.38 | nan | nan | 0.71 |
| QuoraRetrieval | 0.9 | nan | 0.89 | 0.92 |
| RedditClustering | 0.77 | nan | 0.47 | 0.77 |
| RedditClusteringP2P | 0.73 | nan | 0.63 | 0.75 |
| SCIDOCS | 0.21 | 0.25 | 0.17 | 0.35 |
| SICK-R | 0.8 | 0.83 | 0.8 | 0.95 |
| STS12 | 0.82 | 0.82 | 0.8 | 0.95 |
| STS13 | 0.86 | 0.90 | 0.82 | 0.98 |
| STS14 | 0.84 | 0.85 | 0.78 | 0.98 |
| STS15 | 0.86 | 0.90 | 0.89 | 0.98 |
| STS16 | 0.86 | nan | 0.86 | 0.98 |
| STS17 | 0.67 | 0.89 | 0.82 | 0.93 |
| STS22 | 0.64 | nan | 0.59 | 0.84 |
| STSB | 0.81 | 0.85 | 0.82 | 0.92 |
| STSBenchmark | 0.85 | 0.89 | 0.87 | 0.94 |
| SciDocsRR | 0.82 | nan | 0.84 | 0.91 |
| SciFact | 0.72 | nan | 0.7 | 0.87 |
| SprintDuplicateQuestions | 0.96 | 0.97 | 0.93 | 0.98 |
| StackExchangeClustering | 0.78 | nan | 0.58 | 0.84 |
| StackExchangeClusteringP2P | 0.45 | nan | 0.33 | 0.52 |
| StackOverflowDupQuestions | 0.51 | nan | 0.5 | 0.63 |
| SummEval | 0.29 | nan | 0.3 | 0.41 |
| T2Reranking | 0.67 | 0.68 | 0.66 | 0.73 |
| T2Retrieval | 0.85 | nan | 0.76 | 0.89 |
| TNews | 0.51 | nan | 0.49 | 0.59 |
| TRECCOVID | 0.79 | 0.86 | 0.71 | 0.95 |
| ThuNewsClusteringP2P | 0.81 | nan | nan | 0.89 |
| ThuNewsClusteringS2S | 0.76 | nan | nan | 0.88 |
| Touche2020 | 0.28 | nan | 0.23 | 0.39 |
| ToxicConversationsClassification | 0.89 | 0.89 | 0.66 | 0.98 |
| TweetSentimentExtractionClassification | 0.79 | 0.70 | 0.63 | 0.88 |
| TwentyNewsgroupsClustering | 0.74 | nan | 0.39 | 0.83 |
| TwitterSemEval2015 | 0.77 | 0.79 | 0.75 | 0.89 |
| TwitterURLCorpus | 0.86 | 0.87 | 0.86 | 0.96 |
| VideoRetrieval | 0.76 | nan | 0.58 | 0.84 |
| Waimai | 0.89 | nan | 0.86 | 0.92 |
| Average | 0.65 | 0.79 | 0.58 | 0.78 |
|
Folder dir should be revision of the model instead of |
|
The folder has been modified to |
|
@Samoed |
...ct-v2/d2a21c232dc712ae8230af56d1027cf21b7864bf/.ipynb_checkpoints/model_meta-checkpoint.json
Outdated
Show resolved
Hide resolved
results/HIT-TMG__KaLM-embedding-multilingual-mini-instruct-v2/external/model_meta.json
Outdated
Show resolved
Hide resolved
|
Hi @Samoed |
|
Looks good. Will merge after implementation |
Hi @Samoed |
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found here