Skip to content

Comments

Adding voyage-4 results#401

Merged
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
fzoll:voyage-4-results
Jan 14, 2026
Merged

Adding voyage-4 results#401
KennethEnevoldsen merged 1 commit intoembeddings-benchmark:mainfrom
fzoll:voyage-4-results

Conversation

@fzoll
Copy link
Contributor

@fzoll fzoll commented Jan 13, 2026

Adding voyage-4 results. (voyage-4 tokenizer is available now)

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: voyageai/voyage-4
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for voyageai/voyage-4

task_name google/gemini-embedding-001 voyageai/voyage-4 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILACasedocs 0.4833 0.4335 0.2643 0.6541 bflhc/Octen-Embedding-8B False
AILAStatutes 0.4877 0.4991 0.2084 0.9313 bflhc/Octen-Embedding-8B False
AppsRetrieval 0.9375 0.9245 0.3255 0.9722 voyageai/voyage-4-large False
CUREv1 0.5957 0.625 0.5162 0.6694 voyageai/voyage-4-large False
ChatDoctorRetrieval 0.7352 0.7408 0.5687 0.7674 voyageai/voyage-4-large False
Code1Retrieval 0.9474 0.9431 nan 0.9474 google/gemini-embedding-001 False
DS1000Retrieval 0.6870 0.6885 nan 0.7117 voyageai/voyage-4-large False
EnglishFinance1Retrieval 0.7332 0.8001 nan 0.8218 voyageai/voyage-4-large False
EnglishFinance2Retrieval 0.6740 0.8891 nan 0.9099 voyageai/voyage-4-large False
EnglishFinance3Retrieval 0.8330 0.8136 nan 0.8509 nvidia/NV-Embed-v2 False
EnglishFinance4Retrieval 0.5757 0.598 nan 0.6198 voyageai/voyage-4-large False
EnglishHealthcare1Retrieval 0.6338 0.6421 nan 0.6875 bm25s False
FinQARetrieval 0.6464 0.8564 nan 0.8865 voyageai/voyage-4-large False
FinanceBenchRetrieval 0.9157 0.9217 nan 0.9459 bflhc/Octen-Embedding-8B False
French1Retrieval 0.8781 0.856 nan 0.8884 Cohere/Cohere-embed-v4.0 False
FrenchLegal1Retrieval 0.8696 0.9265 nan 0.9490 bm25s False
FreshStackRetrieval 0.3979 0.4662 0.2519 0.5776 bflhc/Octen-Embedding-8B False
German1Retrieval 0.9761 0.9756 nan 0.9771 voyageai/voyage-3-large False
GermanHealthcare1Retrieval 0.8742 0.8838 nan 0.9140 voyageai/voyage-4-large False
GermanLegal1Retrieval 0.7149 0.7478 nan 0.7554 voyageai/voyage-4-large False
HC3FinanceRetrieval 0.7758 0.7334 nan 0.8242 nvidia/NV-Embed-v2 False
HumanEvalRetrieval 0.9910 0.993 nan 0.9977 bflhc/MoD-Embedding False
JapaneseCode1Retrieval 0.8650 0.8443 nan 0.8650 google/gemini-embedding-001 False
JapaneseLegal1Retrieval 0.9228 0.8427 nan 0.9228 google/gemini-embedding-001 False
LegalQuAD 0.6553 0.708 0.4317 0.7675 bm25s False
LegalSummarization 0.7122 0.7461 0.621 0.7921 voyageai/voyage-3.5 False
MBPPRetrieval 0.9416 0.9395 nan 0.9588 voyageai/voyage-4-large False
MIRACLRetrievalHardNegatives 0.7042 0.6056 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
WikiSQLRetrieval 0.8814 0.9702 nan 0.9892 bflhc/Octen-Embedding-8B False
Average 0.7602 0.7798 0.42 0.8374 nan -

@KennethEnevoldsen
Copy link
Contributor

Results spot checked (see model PR) and match.

@KennethEnevoldsen KennethEnevoldsen merged commit 1b673b7 into embeddings-benchmark:main Jan 14, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants