Skip to content

Comments

Adding results for the voyage-4-large and voyage-4-lite models#391

Merged
KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
fzoll:voyage-4
Jan 13, 2026
Merged

Adding results for the voyage-4-large and voyage-4-lite models#391
KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
fzoll:voyage-4

Conversation

@fzoll
Copy link
Contributor

@fzoll fzoll commented Jan 7, 2026

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR #3885
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: voyageai/voyage-4-large, voyageai/voyage-4-lite
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for voyageai/voyage-4-large

task_name google/gemini-embedding-001 voyageai/voyage-4-large intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILACasedocs 0.4833 0.4575 0.2643 0.6541 bflhc/Octen-Embedding-8B False
AILAStatutes 0.4877 0.5002 0.2084 0.9313 bflhc/Octen-Embedding-8B False
AppsRetrieval 0.9375 0.9722 0.3255 0.9463 voyageai/voyage-3-large False
CUREv1 0.5957 0.6694 0.5162 0.6590 bflhc/Octen-Embedding-4B False
ChatDoctorRetrieval 0.7352 0.7674 0.5687 0.7390 voyageai/voyage-3-large False
Code1Retrieval 0.9474 0.9433 nan 0.9474 google/gemini-embedding-001 False
DS1000Retrieval 0.6870 0.7117 nan 0.7103 bflhc/Octen-Embedding-8B False
EnglishFinance1Retrieval 0.7332 0.8218 nan 0.8188 voyageai/voyage-3.5 False
EnglishFinance2Retrieval 0.6740 0.9099 nan 0.8851 voyageai/voyage-3.5 False
EnglishFinance3Retrieval 0.8330 0.8278 nan 0.8509 nvidia/NV-Embed-v2 False
EnglishFinance4Retrieval 0.5757 0.6198 nan 0.5997 voyageai/voyage-3-large False
EnglishHealthcare1Retrieval 0.6338 0.6747 nan 0.6875 bm25s False
FinQARetrieval 0.6464 0.8865 nan 0.8552 voyageai/voyage-3.5 (output_dtype=int8) False
FinanceBenchRetrieval 0.9157 0.9288 nan 0.9459 bflhc/Octen-Embedding-8B False
French1Retrieval 0.8781 0.8587 nan 0.8884 Cohere/Cohere-embed-v4.0 False
FrenchLegal1Retrieval 0.8696 0.9362 nan 0.9490 bm25s False
FreshStackRetrieval 0.3979 0.4933 0.2519 0.5776 bflhc/Octen-Embedding-8B False
German1Retrieval 0.9761 0.9762 nan 0.9771 voyageai/voyage-3-large False
GermanHealthcare1Retrieval 0.8742 0.9140 nan 0.8850 bflhc/Octen-Embedding-8B False
GermanLegal1Retrieval 0.7149 0.7554 nan 0.7405 voyageai/voyage-3-large False
HC3FinanceRetrieval 0.7758 0.7671 nan 0.8242 nvidia/NV-Embed-v2 False
HumanEvalRetrieval 0.9910 0.9957 nan 0.9977 bflhc/Octen-Embedding-8B False
JapaneseCode1Retrieval 0.8650 0.8615 nan 0.8650 google/gemini-embedding-001 False
JapaneseLegal1Retrieval 0.9228 0.8567 nan 0.9228 google/gemini-embedding-001 False
LegalQuAD 0.6553 0.7211 0.4317 0.7675 bm25s False
LegalSummarization 0.7122 0.7836 0.621 0.7921 voyageai/voyage-3.5 False
MBPPRetrieval 0.9416 0.9588 nan 0.9416 google/gemini-embedding-001 False
MIRACLRetrievalHardNegatives 0.7042 0.6213 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
WikiSQLRetrieval 0.8814 0.9621 nan 0.9892 bflhc/Octen-Embedding-8B False
Average 0.7602 0.7984 0.42 0.8303 nan -

Model have high performance on these tasks: AppsRetrieval,MBPPRetrieval,EnglishFinance2Retrieval,GermanHealthcare1Retrieval,FinQARetrieval,EnglishFinance1Retrieval,GermanLegal1Retrieval,ChatDoctorRetrieval,DS1000Retrieval,CUREv1,EnglishFinance4Retrieval


Results for voyageai/voyage-4-lite

task_name google/gemini-embedding-001 voyageai/voyage-4-lite intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILACasedocs 0.4833 0.4188 0.2643 0.6541 bflhc/Octen-Embedding-8B False
AILAStatutes 0.4877 0.4849 0.2084 0.9313 bflhc/Octen-Embedding-8B False
AppsRetrieval 0.9375 0.8573 0.3255 0.9463 voyageai/voyage-3-large False
CUREv1 0.5957 0.5324 0.5162 0.6590 bflhc/Octen-Embedding-4B False
ChatDoctorRetrieval 0.7352 0.7038 0.5687 0.7390 voyageai/voyage-3-large False
Code1Retrieval 0.9474 0.9376 nan 0.9474 google/gemini-embedding-001 False
DS1000Retrieval 0.6870 0.6646 nan 0.7103 bflhc/Octen-Embedding-8B False
EnglishFinance1Retrieval 0.7332 0.7722 nan 0.8188 voyageai/voyage-3.5 False
EnglishFinance2Retrieval 0.6740 0.8779 nan 0.8851 voyageai/voyage-3.5 False
EnglishFinance3Retrieval 0.8330 0.7643 nan 0.8509 nvidia/NV-Embed-v2 False
EnglishFinance4Retrieval 0.5757 0.5646 nan 0.5997 voyageai/voyage-3-large False
EnglishHealthcare1Retrieval 0.6338 0.609 nan 0.6875 bm25s False
FinQARetrieval 0.6464 0.8383 nan 0.8552 voyageai/voyage-3.5 (output_dtype=int8) False
FinanceBenchRetrieval 0.9157 0.9148 nan 0.9459 bflhc/Octen-Embedding-8B False
French1Retrieval 0.8781 0.845 nan 0.8884 Cohere/Cohere-embed-v4.0 False
FrenchLegal1Retrieval 0.8696 0.9213 nan 0.9490 bm25s False
FreshStackRetrieval 0.3979 0.4365 0.2519 0.5776 bflhc/Octen-Embedding-8B False
German1Retrieval 0.9761 0.9658 nan 0.9771 voyageai/voyage-3-large False
GermanHealthcare1Retrieval 0.8742 0.8319 nan 0.8850 bflhc/Octen-Embedding-8B False
GermanLegal1Retrieval 0.7149 0.7255 nan 0.7405 voyageai/voyage-3-large False
HC3FinanceRetrieval 0.7758 0.6793 nan 0.8242 nvidia/NV-Embed-v2 False
HumanEvalRetrieval 0.9910 0.9907 nan 0.9977 bflhc/Octen-Embedding-8B False
JapaneseCode1Retrieval 0.8650 0.8051 nan 0.8650 google/gemini-embedding-001 False
JapaneseLegal1Retrieval 0.9228 0.7775 nan 0.9228 google/gemini-embedding-001 False
LegalQuAD 0.6553 0.6772 0.4317 0.7675 bm25s False
LegalSummarization 0.7122 0.7264 0.621 0.7921 voyageai/voyage-3.5 False
MBPPRetrieval 0.9416 0.9182 nan 0.9416 google/gemini-embedding-001 False
MIRACLRetrievalHardNegatives 0.7042 0.5712 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
WikiSQLRetrieval 0.8814 0.9666 nan 0.9892 bflhc/Octen-Embedding-8B False
Average 0.7602 0.751 0.42 0.8303 nan -

@Samoed
Copy link
Member

Samoed commented Jan 7, 2026

Can you add implementations of these models to mteb?

@fzoll
Copy link
Contributor Author

fzoll commented Jan 7, 2026

@Samoed I added the models: embeddings-benchmark/mteb#3885

@KennethEnevoldsen
Copy link
Contributor

@fzoll we see quite large improvements on a few of the datasets - I just need a confirmation that you haven't trained on these, intentional or by accident?

@fzoll
Copy link
Contributor Author

fzoll commented Jan 8, 2026

@KennethEnevoldsen I can confirm. I discussed with the team and the improvements come from the use of MoE.

@KennethEnevoldsen
Copy link
Contributor

This is currently waiting on: embeddings-benchmark/mteb#3901

@fzoll fzoll changed the title Adding results for the voyage-4-large and voyage-4-lite models Adding results for the voyage-4-large, voyage-4 and voyage-4-lite models Jan 10, 2026
@fzoll fzoll changed the title Adding results for the voyage-4-large, voyage-4 and voyage-4-lite models Adding results for the voyage-4-large and voyage-4-lite models Jan 11, 2026
@fzoll
Copy link
Contributor Author

fzoll commented Jan 11, 2026

@KennethEnevoldsen I removed the voyage-4 model results as the tokenizer is not available for that model yet. (voyage-4-large and voyage-4-lite should work now.)

@fzoll
Copy link
Contributor Author

fzoll commented Jan 13, 2026

@KennethEnevoldsen, can you please merge the results as well, as the model implementation is merged?

@KennethEnevoldsen
Copy link
Contributor

Given the current suggestions on embeddings-benchmark/mteb#3902 and that I have reproduced some of the results using the current model implementations, I don't see an issue merging this in.

@KennethEnevoldsen KennethEnevoldsen merged commit 3c38619 into embeddings-benchmark:main Jan 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants