add result for bflhc/Octen-Embedding-4B#384
add result for bflhc/Octen-Embedding-4B#384Samoed merged 2 commits intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | bflhc/Octen-Embedding-4B | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AILACasedocs | 0.6102 | 0.4833 | 0.2643 | 0.6304 | bflhc/MoD-Embedding | False |
| AILAStatutes | 0.8557 | 0.4877 | 0.2084 | 0.9085 | bflhc/Octen-Embedding-8B | False |
| AppsRetrieval | 0.9081 | 0.9375 | 0.3255 | 0.9463 | voyageai/voyage-3-large | False |
| CUREv1 | 0.6590 | 0.5957 | 0.5162 | 0.6289 | nvidia/NV-Embed-v2 | False |
| ChatDoctorRetrieval | 0.7097 | 0.7352 | 0.5687 | 0.7390 | voyageai/voyage-3-large | False |
| DS1000Retrieval | 0.6936 | 0.6870 | nan | 0.6988 | bflhc/Octen-Embedding-8B | False |
| FinQARetrieval | 0.8004 | 0.6464 | nan | 0.8552 | voyageai/voyage-3.5 (output_dtype=int8) | False |
| FinanceBenchRetrieval | 0.8978 | 0.9157 | nan | 0.9306 | bflhc/Octen-Embedding-8B | False |
| FreshStackRetrieval | 0.5275 | 0.3979 | 0.2519 | 0.5126 | bflhc/Octen-Embedding-8B | False |
| HC3FinanceRetrieval | 0.7015 | 0.7758 | nan | 0.8242 | nvidia/NV-Embed-v2 | False |
| HumanEvalRetrieval | 0.9953 | 0.9910 | nan | 0.9977 | bflhc/Octen-Embedding-8B | False |
| LegalQuAD | 0.7325 | 0.6553 | 0.4317 | 0.7675 | bm25s | False |
| LegalSummarization | 0.7581 | 0.7122 | 0.621 | 0.7921 | voyageai/voyage-3.5 | False |
| MBPPRetrieval | 0.9294 | 0.9416 | nan | 0.9416 | google/gemini-embedding-001 | False |
| MIRACLRetrievalHardNegatives | 0.6372 | 0.7042 | 0.5923 | 0.7305 | nvidia/llama-embed-nemotron-8b | False |
| WikiSQLRetrieval | 0.9784 | 0.8814 | nan | 0.9885 | bflhc/Octen-Embedding-8B | False |
| Average | 0.7747 | 0.7218 | 0.42 | 0.8058 | nan | - |
Model have high performance on these tasks: CUREv1,FreshStackRetrieval
|
I’ll update the results for the remaining two RTEB datasets in about 3 hours. If possible, I’d really appreciate it if you could merge the PR embeddings-benchmark/mteb#3816 and start running the private evaluation afterward. Thanks a lot! @Samoed |
|
Yes, I'll start run |
Is the private eval running smoothly? How’s the progress so far? @Samoed |
|
I've missed that my script failed and I restarted evaluation |
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here