Skip to content

Comments

add result for bflhc/Octen-Embedding-4B#384

Merged
Samoed merged 2 commits intoembeddings-benchmark:mainfrom
bflhc:feature/add_octen_4b_results
Dec 30, 2025
Merged

add result for bflhc/Octen-Embedding-4B#384
Samoed merged 2 commits intoembeddings-benchmark:mainfrom
bflhc:feature/add_octen_4b_results

Conversation

@bflhc
Copy link
Contributor

@bflhc bflhc commented Dec 30, 2025

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Dec 30, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: bflhc/Octen-Embedding-4B
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, DS1000Retrieval, FinQARetrieval, FinanceBenchRetrieval, FreshStackRetrieval, HC3FinanceRetrieval, HumanEvalRetrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for bflhc/Octen-Embedding-4B

task_name bflhc/Octen-Embedding-4B google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILACasedocs 0.6102 0.4833 0.2643 0.6304 bflhc/MoD-Embedding False
AILAStatutes 0.8557 0.4877 0.2084 0.9085 bflhc/Octen-Embedding-8B False
AppsRetrieval 0.9081 0.9375 0.3255 0.9463 voyageai/voyage-3-large False
CUREv1 0.6590 0.5957 0.5162 0.6289 nvidia/NV-Embed-v2 False
ChatDoctorRetrieval 0.7097 0.7352 0.5687 0.7390 voyageai/voyage-3-large False
DS1000Retrieval 0.6936 0.6870 nan 0.6988 bflhc/Octen-Embedding-8B False
FinQARetrieval 0.8004 0.6464 nan 0.8552 voyageai/voyage-3.5 (output_dtype=int8) False
FinanceBenchRetrieval 0.8978 0.9157 nan 0.9306 bflhc/Octen-Embedding-8B False
FreshStackRetrieval 0.5275 0.3979 0.2519 0.5126 bflhc/Octen-Embedding-8B False
HC3FinanceRetrieval 0.7015 0.7758 nan 0.8242 nvidia/NV-Embed-v2 False
HumanEvalRetrieval 0.9953 0.9910 nan 0.9977 bflhc/Octen-Embedding-8B False
LegalQuAD 0.7325 0.6553 0.4317 0.7675 bm25s False
LegalSummarization 0.7581 0.7122 0.621 0.7921 voyageai/voyage-3.5 False
MBPPRetrieval 0.9294 0.9416 nan 0.9416 google/gemini-embedding-001 False
MIRACLRetrievalHardNegatives 0.6372 0.7042 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
WikiSQLRetrieval 0.9784 0.8814 nan 0.9885 bflhc/Octen-Embedding-8B False
Average 0.7747 0.7218 0.42 0.8058 nan -

Model have high performance on these tasks: CUREv1,FreshStackRetrieval


@bflhc
Copy link
Contributor Author

bflhc commented Dec 30, 2025

I’ll update the results for the remaining two RTEB datasets in about 3 hours. If possible, I’d really appreciate it if you could merge the PR embeddings-benchmark/mteb#3816 and start running the private evaluation afterward. Thanks a lot! @Samoed

@Samoed
Copy link
Member

Samoed commented Dec 30, 2025

Yes, I'll start run

@Samoed Samoed merged commit 7c98c91 into embeddings-benchmark:main Dec 30, 2025
3 checks passed
@bflhc
Copy link
Contributor Author

bflhc commented Dec 31, 2025

Yes, I'll start run

Is the private eval running smoothly? How’s the progress so far? @Samoed

@Samoed
Copy link
Member

Samoed commented Dec 31, 2025

I've missed that my script failed and I restarted evaluation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants