add RTEB & MTEB results of Octen-Embedding-8B#374
add RTEB & MTEB results of Octen-Embedding-8B#374Samoed merged 2 commits intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | bflhc/Octen-Embedding-8B | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result |
|---|---|---|---|---|---|
| AILACasedocs | 0.6109 | 0.4833 | 0.2643 | 0.4833 | google/gemini-embedding-001 |
| AILAStatutes | 0.9085 | 0.4877 | 0.2084 | 0.9003 | Mira190/Euler-Legal-Embedding-V1 |
| AfriSentiClassification | 0.4599 | 0.5356 | 0.455 | 0.5688 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| AlloProfClusteringS2S.v2 | 0.5860 | 0.5636 | 0.3515 | 0.5965 | Qwen/Qwen3-Embedding-8B |
| AlloprofReranking | 0.8540 | 0.8177 | 0.6944 | 0.8513 | Qwen/Qwen3-Embedding-4B |
| AmazonCounterfactualClassification | 0.9249 | 0.8820 | 0.7935 | 0.9696 | GeoGPT-Research-Project/GeoEmbedding |
| AppsRetrieval | 0.9206 | 0.9375 | 0.3255 | 0.9463 | voyageai/voyage-3-large |
| ArXivHierarchicalClusteringP2P | 0.6472 | 0.6492 | 0.5569 | 0.6869 | NovaSearch/jasper_en_vision_language_v1 |
| ArXivHierarchicalClusteringS2S | 0.6455 | 0.6384 | 0.5621 | 0.6548 | Qwen/Qwen3-Embedding-8B |
| ArguAna | 0.7831 | 0.8644 | 0.5438 | 0.8979 | voyageai/voyage-3-m-exp |
| ArmenianParaphrasePC | 0.9680 | 0.9689 | 0.9493 | 0.9703 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| BUCC.v2 | 0.9893 | 0.9899 | 0.9878 | 0.9902 | GritLM/GritLM-7B |
| BelebeleRetrieval | 0.8899 | 0.9073 | 0.7791 | 0.9380 | clips/e5-base-trm-nl |
| BibleNLPBitextMining | 0.2633 | 0.2072 | 0.1665 | 0.9899 | deepvk/USER-bge-m3 |
| BigPatentClustering.v2 | 0.3146 | 0.3806 | 0.3466 | 0.4453 | Salesforce/SFR-Embedding-2_R |
| BiorxivClusteringP2P.v2 | 0.5088 | 0.5386 | 0.3778 | 0.8417 | codefuse-ai/F2LLM-4B |
| BornholmBitextMining | 0.7603 | 0.5169 | 0.4416 | 0.7633 | Qwen/Qwen3-Embedding-8B |
| BrazilianToxicTweetsClassification | 0.2100 | 0.2802 | 0.2123 | 0.3157 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| BulgarianStoreReviewSentimentClassfication | 0.6901 | 0.7813 | 0.7093 | 0.8044 | Linq-AI-Research/Linq-Embed-Mistral |
| CEDRClassification | 0.5256 | 0.5742 | 0.4484 | 0.7301 | sergeyzh/BERTA |
| CLSClusteringP2P.v2 | 0.7489 | 0.4268 | 0.4037 | 0.7572 | Qwen/Qwen3-Embedding-8B |
| CSFDSKMovieReviewSentimentClassification | 0.4966 | 0.4938 | 0.3664 | 0.6456 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| CTKFactsNLI | 0.8667 | 0.8759 | 0.8096 | 0.8993 | omarelshehy/arabic-english-sts-matryoshka |
| CUREv1 | 0.5949 | 0.5957 | 0.5162 | 0.6289 | nvidia/NV-Embed-v2 |
| CataloniaTweetClassification | 0.5265 | 0.5451 | 0.504 | 0.7790 | Bytedance/Seed1.6-embedding-1215 |
| ChatDoctorRetrieval | 0.7339 | 0.7352 | 0.5687 | 0.7390 | voyageai/voyage-3-large |
| Core17InstructionRetrieval | 0.1180 | 0.0769 | -0.0162 | 0.1648 | jhu-clsp/FollowIR-7B |
| CovidRetrieval | 0.8623 | 0.7913 | 0.7561 | 0.9606 | TencentBAC/Conan-embedding-v2 |
| CyrillicTurkicLangClassification | 0.6366 | 0.9530 | 0.4085 | 0.9905 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| CzechProductReviewSentimentClassification | 0.5786 | 0.6816 | 0.5742 | 0.7667 | Bytedance/Seed1.6-embedding-1215 |
| DBpediaClassification | 0.9664 | 0.9476 | 0.8828 | 0.9926 | Qwen/Qwen3-Embedding-8B |
| DS1000Retrieval | 0.6988 | 0.6870 | nan | 0.6897 | voyageai/voyage-3-large |
| DalajClassification | 0.5105 | 0.5047 | 0.5001 | 0.6213 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| DiaBlaBitextMining | 0.8686 | 0.8723 | 0.8483 | 0.8865 | nvidia/llama-embed-nemotron-8b |
| EstonianValenceClassification | 0.4342 | 0.5352 | 0.4358 | 0.6456 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| FaroeseSTS | 0.8671 | 0.8612 | 0.7239 | 0.9739 | Gameselo/STS-multilingual-mpnet-base-v2 |
| FilipinoShopeeReviewsClassification | 0.3990 | 0.4845 | 0.3527 | 0.5159 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| FinParaSTS | 0.3456 | 0.2860 | 0.2666 | 0.3399 | Qwen/Qwen3-Embedding-4B |
| FinQARetrieval | 0.7842 | 0.6464 | nan | 0.8552 | voyageai/voyage-3.5 (output_dtype=int8) |
| FinanceBenchRetrieval | 0.9306 | 0.9157 | nan | 0.9298 | voyageai/voyage-3-large |
| FinancialPhrasebankClassification | 0.8493 | 0.8864 | 0.8404 | 0.9515 | Qwen/Qwen3-Embedding-8B |
| FloresBitextMining | 0.7639 | 0.8371 | 0.8108 | 0.8596 | intfloat/multilingual-e5-large-instruct |
| FreshStackRetrieval | 0.5126 | 0.3979 | 0.2519 | 0.4438 | voyageai/voyage-3-large |
| GermanSTSBenchmark | 0.9003 | 0.8809 | 0.8527 | 0.9541 | Gameselo/STS-multilingual-mpnet-base-v2 |
| GreekLegalCodeClassification | 0.5433 | 0.4376 | 0.3713 | 0.8052 | Bytedance/Seed1.6-embedding-1215 |
| GujaratiNewsClassification | 0.9079 | 0.9205 | 0.7674 | 0.9343 | Bytedance/Seed1.6-embedding-1215 |
| HALClusteringS2S.v2 | 0.3089 | 0.3200 | 0.2261 | 0.3228 | Qwen/Qwen3-Embedding-8B |
| HC3FinanceRetrieval | 0.7395 | 0.7758 | nan | 0.8242 | nvidia/NV-Embed-v2 |
| HagridRetrieval | 0.9875 | 0.9931 | 0.9891 | 0.9931 | google/gemini-embedding-001 |
| HumanEvalRetrieval | 0.9977 | 0.9910 | nan | 0.9945 | voyageai/voyage-3-large |
| IN22GenBitextMining | 0.8159 | 0.9375 | 0.7675 | 0.9375 | google/gemini-embedding-001 |
| IndicCrosslingualSTS | 0.6188 | 0.6287 | 0.4387 | 0.8477 | Gameselo/STS-multilingual-mpnet-base-v2 |
| IndicGenBenchFloresBitextMining | 0.9413 | 0.9677 | 0.8875 | 0.9881 | Sailesh97/Hinvec |
| IndicLangClassification | 0.3076 | 0.8769 | 0.2025 | 0.9930 | Bytedance/Seed1.6-embedding-1215 |
| IndonesianIdClickbaitClassification | 0.6003 | 0.6700 | 0.6122 | 0.7560 | nvidia/llama-embed-nemotron-8b |
| IsiZuluNewsClassification | 0.2771 | 0.4053 | 0.3241 | 0.4053 | google/gemini-embedding-001 |
| ItaCaseholdClassification | 0.7163 | 0.7330 | 0.6679 | 0.9439 | bigscience/sgpt-bloom-7b1-msmarco |
| JSICK | 0.8963 | 0.8499 | 0.7983 | 0.8938 | Qwen/Qwen3-Embedding-8B |
| KorHateSpeechMLClassification | 0.1162 | 0.1769 | 0.1049 | 0.7625 | Bytedance/Seed1.6-embedding-1215 |
| KorSarcasmClassification | 0.5968 | 0.6051 | 0.5679 | 0.6479 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| KurdishSentimentClassification | 0.7860 | 0.8639 | 0.7708 | 0.9403 | Bytedance/Seed1.6-embedding-1215 |
| LEMBPasskeyRetrieval | 0.8900 | 0.3850 | 0.3825 | 1.0000 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| LegalBenchCorporateLobbying | 0.9549 | 0.9598 | 0.8972 | 0.9696 | voyageai/voyage-3-large |
| LegalQuAD | 0.7174 | 0.6553 | 0.4317 | 0.7675 | bm25s |
| LegalSummarization | 0.7653 | 0.7122 | 0.621 | 0.7921 | voyageai/voyage-3.5 |
| MBPPRetrieval | 0.9243 | 0.9416 | nan | 0.9416 | google/gemini-embedding-001 |
| MIRACLRetrievalHardNegatives | 0.6702 | 0.7042 | 0.6675 | 0.7305 | nvidia/llama-embed-nemotron-8b |
| MLQARetrieval | 0.8127 | 0.8416 | 0.7566 | 0.8416 | google/gemini-embedding-001 |
| MacedonianTweetSentimentClassification | 0.6850 | 0.7183 | 0.6192 | 0.7547 | Qwen/Qwen3-Embedding-4B |
| MalteseNewsClassification | 0.3646 | 0.3738 | 0.2533 | 0.6938 | Bytedance/Seed1.6-embedding-1215 |
| MasakhaNEWSClassification | 0.8316 | 0.8355 | 0.7754 | 0.9009 | Bytedance/Seed1.6-embedding-1215 |
| MasakhaNEWSClusteringS2S | 0.5670 | 0.5745 | 0.3804 | 0.7365 | Bytedance/Seed1.6-embedding-1215 |
| MassiveIntentClassification | 0.7889 | 0.8192 | 0.674 | 0.9194 | voyageai/voyage-3-m-exp |
| MedrxivClusteringP2P.v2 | 0.4564 | 0.4716 | 0.3515 | 0.7199 | codefuse-ai/F2LLM-4B |
| MultiEURLEXMultilabelClassification | 0.0449 | 0.0528 | 0.0516 | 0.0968 | Bytedance/Seed1.6-embedding-1215 |
| MultiHateClassification | 0.7533 | 0.7247 | 0.6357 | 0.8374 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| NTREXBitextMining | 0.8889 | 0.9364 | 0.914 | 0.9456 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| NepaliNewsClassification | 0.9727 | 0.9814 | 0.8847 | 0.9817 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| News21InstructionRetrieval | 0.0566 | 0.1026 | -0.0006 | 0.1145 | google/embeddinggemma-300m |
| NollySentiBitextMining | 0.6380 | 0.6871 | 0.675 | 0.8083 | nvidia/llama-embed-nemotron-8b |
| NordicLangClassification | 0.9192 | 0.8597 | 0.8015 | 0.9384 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| NorwegianCourtsBitextMining | 0.9357 | 0.9342 | 0.9404 | 0.9447 | OrdalieTech/Solon-embeddings-large-0.1 |
| NusaParagraphEmotionClassification | 0.5584 | 0.5638 | 0.4166 | 0.8374 | Bytedance/Seed1.6-embedding-1215 |
| NusaTranslationBitextMining | 0.9133 | 0.7752 | 0.672 | 0.9222 | Qwen/Qwen3-Embedding-8B |
| NusaX-senti | 0.7068 | 0.8031 | 0.7055 | 0.8482 | Bytedance/Seed1.6-embedding-1215 |
| NusaXBitextMining | 0.8780 | 0.8252 | 0.7267 | 0.9056 | Bytedance/Seed1.6-embedding-1215 |
| OdiaNewsClassification | 0.9179 | 0.9184 | 0.8001 | 0.9715 | Bytedance/Seed1.6-embedding-1215 |
| OpusparcusPC | 0.9617 | 0.9662 | 0.948 | 1.0000 | BAAI/bge-multilingual-gemma2 |
| PAC | 0.6391 | 0.7168 | 0.7033 | 0.8811 | Bytedance/Seed1.6-embedding-1215 |
| PawsXPairClassification | 0.7281 | 0.5999 | 0.5514 | 0.7557 | Bytedance/Seed1.6-embedding-1215 |
| PlscClusteringP2P.v2 | 0.7416 | 0.7431 | 0.7161 | 0.7542 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| PoemSentimentClassification | 0.5492 | 0.5966 | 0.5067 | 0.8642 | Bytedance/Seed1.6-embedding-1215 |
| PolEmo2.0-OUT | 0.5510 | 0.7753 | 0.5348 | 0.8006 | nvidia/llama-embed-nemotron-8b |
| PpcPC | 0.9463 | 0.9550 | 0.9218 | 0.9554 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| PunjabiNewsClassification | 0.8261 | 0.8261 | 0.807 | 0.8879 | Bytedance/Seed1.6-embedding-1215 |
| RTE3 | 0.9053 | 0.8955 | 0.8752 | 0.9173 | Bytedance/Seed1.6-embedding-1215 |
| Robust04InstructionRetrieval | 0.0924 | -0.0241 | -0.0748 | 0.1372 | jhu-clsp/FollowIR-7B |
| RomaniBibleClustering | 0.4238 | 0.4322 | 0.4092 | 0.4589 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| RuBQReranking | 0.7688 | 0.7384 | 0.756 | 0.8051 | ai-sage/Giga-Embeddings-instruct |
| SCIDOCS | 0.3181 | 0.2515 | 0.1747 | 0.5986 | IEITYuan/Yuan-embedding-2.0-en |
| SIB200ClusteringS2S | 0.4641 | 0.4174 | 0.4115 | 0.5126 | sbintuitions/sarashina-embedding-v2-1b |
| SICK-R | 0.8816 | 0.8275 | 0.8023 | 0.9465 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STS12 | 0.8624 | 0.8155 | 0.8002 | 0.9546 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STS13 | 0.9400 | 0.8989 | 0.8155 | 0.9776 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STS14 | 0.9052 | 0.8541 | 0.7772 | 0.9753 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STS15 | 0.9382 | 0.9044 | 0.8931 | 0.9811 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STS17 | 0.9179 | 0.8858 | 0.8215 | 0.9342 | infgrad/Jasper-Token-Compression-600M |
| STS22.v2 | 0.7395 | 0.7169 | 0.643 | 0.7718 | Kingsoft-LLM/QZhou-Embedding |
| STSB | 0.8630 | 0.8550 | 0.8236 | 0.9199 | Gameselo/STS-multilingual-mpnet-base-v2 |
| STSBenchmark | 0.9361 | 0.8908 | 0.8729 | 0.9504 | Kingsoft-LLM/QZhou-Embedding |
| STSES | 0.7481 | 0.8175 | 0.8021 | 0.8231 | google/embeddinggemma-300m |
| ScalaClassification | 0.5645 | 0.5185 | 0.5157 | 0.8626 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| SemRel24STS | 0.6437 | 0.7314 | 0.6266 | 0.8112 | VPLabs/SearchMap_Preview |
| SentimentAnalysisHindi | 0.6230 | 0.7606 | 0.642 | 0.8001 | Qwen/Qwen3-Embedding-8B |
| SinhalaNewsClassification | 0.7599 | 0.8229 | 0.6682 | 0.8547 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| SiswatiNewsClassification | 0.5913 | 0.6238 | 0.535 | 0.7837 | Lajavaness/bilingual-embedding-small |
| SlovakMovieReviewSentimentClassification | 0.8994 | 0.9035 | 0.7441 | 0.9539 | Bytedance/Seed1.6-embedding-1215 |
| SpartQA | 0.1590 | 0.1030 | 0.0565 | 0.8483 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| SprintDuplicateQuestions | 0.9562 | 0.9690 | 0.9318 | 0.9838 | Kingsoft-LLM/QZhou-Embedding |
| StackExchangeClustering.v2 | 0.7855 | 0.9207 | 0.4643 | 0.9207 | google/gemini-embedding-001 |
| StackOverflowQA | 0.9450 | 0.9671 | 0.8889 | 0.9720 | Bytedance/Seed1.6-embedding-1215 |
| StatcanDialogueDatasetRetrieval | 0.4856 | 0.5111 | 0.1063 | 0.5807 | jinaai/jina-embeddings-v4 |
| SwahiliNewsClassification | 0.6562 | 0.6605 | 0.5969 | 0.6753 | Qwen/Qwen3-Embedding-8B |
| SwednClusteringP2P | 0.5779 | 0.4584 | 0.3691 | 0.6213 | Qwen/Qwen3-Embedding-4B |
| SwissJudgementClassification | 0.5952 | 0.5786 | 0.5362 | 0.7791 | Bytedance/Seed1.6-embedding-1215 |
| T2Reranking | 0.6714 | 0.6795 | 0.6632 | 0.7315 | tencent/Youtu-Embedding |
| TERRa | 0.6641 | 0.6392 | 0.5842 | 0.7957 | ai-sage/Giga-Embeddings-instruct |
| TRECCOVID | 0.9125 | 0.8631 | 0.7133 | 0.9833 | IEITYuan/Yuan-embedding-2.0-en |
| Tatoeba | 0.7888 | 0.8197 | 0.7574 | 0.9394 | OrlikB/KartonBERT-USE-base-v1 |
| TempReasonL1 | 0.0186 | 0.0296 | 0.0114 | 0.0805 | nvidia/llama-embed-nemotron-8b |
| ToxicConversationsClassification | 0.9148 | 0.8875 | 0.7132 | 0.9759 | voyageai/voyage-3-m-exp |
| TswanaNewsClassification | 0.3979 | 0.5337 | 0.47 | 0.6417 | Bytedance/Seed1.6-embedding-1215 |
| TweetTopicSingleClassification | 0.7643 | 0.7111 | 0.6532 | 0.8561 | Bytedance/Seed1.6-embedding-1215 |
| TwitterHjerneRetrieval | 0.8103 | 0.9802 | 0.3522 | 0.9802 | google/gemini-embedding-001 |
| TwitterURLCorpus | 0.8655 | 0.8705 | 0.8589 | 0.9571 | TencentBAC/Conan-embedding-v2 |
| VoyageMMarcoReranking | 0.6800 | 0.6673 | 0.6821 | 0.7351 | jinaai/jina-reranker-v3 |
| WebLINXCandidatesReranking | 0.1741 | 0.1097 | 0.0778 | 0.1792 | Bytedance/Seed1.6-embedding-1215 |
| WikiCitiesClustering | 0.8110 | 0.9163 | 0.755 | 0.9357 | Qwen/Qwen3-Embedding-4B |
| WikiClusteringP2P.v2 | 0.3223 | 0.2823 | 0.256 | 0.3295 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| WikiSQLRetrieval | 0.9885 | 0.8814 | nan | 0.9608 | jinaai/jina-embeddings-v4 |
| WikipediaRerankingMultilingual | 0.9100 | 0.9224 | 0.8981 | 0.9308 | jinaai/jina-reranker-v3 |
| WikipediaRetrievalMultilingual | 0.9225 | 0.9420 | 0.9111 | 0.9420 | google/gemini-embedding-001 |
| WinoGrande | 0.5709 | 0.6052 | 0.5498 | 0.8989 | tencent/KaLM-Embedding-Gemma3-12B-2511 |
| XNLI | 0.8595 | 0.8526 | 0.7477 | 0.9291 | Bytedance/Seed1.6-embedding-1215 |
| indonli | 0.6424 | 0.6069 | 0.5174 | 0.6722 | Bytedance/Seed1.6-embedding-1215 |
| Average | 0.6883 | 0.6891 | 0.5834 | 0.7911 | nan |
Model have high performance on these tasks: HumanEvalRetrieval,WikiSQLRetrieval,FinanceBenchRetrieval,AILAStatutes,JSICK,AlloprofReranking,DS1000Retrieval,AILACasedocs,FreshStackRetrieval,FinParaSTS
|
Octen-Embedding-8B is optimized for retrieval tasks. It is trained on Qwen3-Embedding-8B using a large amount of real-world industry search data, combined with high-quality synthetic data. From the results, we can see significant improvements over the base model on both RTEB and MTEB reranking/retrieval tasks. We believe this model could bring substantial value to the open-source community. We would also appreciate it if you could help run the RTEB private evaluation, so that we can more comprehensively assess the model’s performance. Regarding configuration, a |
Samoed
left a comment
There was a problem hiding this comment.
Run Aila tasks and can reproduce results
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here