Adding results for the voyage-4-large and voyage-4-lite models by fzoll · Pull Request #391 · embeddings-benchmark/results

fzoll · 2026-01-07T12:32:40Z

Checklist

My model has a model sheet, report, or similar
My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR #3885
The results submitted are obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

github-actions · 2026-01-07T12:36:11Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: voyageai/voyage-4-large, voyageai/voyage-4-lite
Tasks: AILACasedocs, AILAStatutes, AppsRetrieval, CUREv1, ChatDoctorRetrieval, Code1Retrieval, DS1000Retrieval, EnglishFinance1Retrieval, EnglishFinance2Retrieval, EnglishFinance3Retrieval, EnglishFinance4Retrieval, EnglishHealthcare1Retrieval, FinQARetrieval, FinanceBenchRetrieval, French1Retrieval, FrenchLegal1Retrieval, FreshStackRetrieval, German1Retrieval, GermanHealthcare1Retrieval, GermanLegal1Retrieval, HC3FinanceRetrieval, HumanEvalRetrieval, JapaneseCode1Retrieval, JapaneseLegal1Retrieval, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, WikiSQLRetrieval

Results for `voyageai/voyage-4-large`

task_name	google/gemini-embedding-001	voyageai/voyage-4-large	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
AILACasedocs	0.4833	0.4575	0.2643	0.6541	bflhc/Octen-Embedding-8B	False
AILAStatutes	0.4877	0.5002	0.2084	0.9313	bflhc/Octen-Embedding-8B	False
AppsRetrieval	0.9375	0.9722	0.3255	0.9463	voyageai/voyage-3-large	False
CUREv1	0.5957	0.6694	0.5162	0.6590	bflhc/Octen-Embedding-4B	False
ChatDoctorRetrieval	0.7352	0.7674	0.5687	0.7390	voyageai/voyage-3-large	False
Code1Retrieval	0.9474	0.9433	nan	0.9474	google/gemini-embedding-001	False
DS1000Retrieval	0.6870	0.7117	nan	0.7103	bflhc/Octen-Embedding-8B	False
EnglishFinance1Retrieval	0.7332	0.8218	nan	0.8188	voyageai/voyage-3.5	False
EnglishFinance2Retrieval	0.6740	0.9099	nan	0.8851	voyageai/voyage-3.5	False
EnglishFinance3Retrieval	0.8330	0.8278	nan	0.8509	nvidia/NV-Embed-v2	False
EnglishFinance4Retrieval	0.5757	0.6198	nan	0.5997	voyageai/voyage-3-large	False
EnglishHealthcare1Retrieval	0.6338	0.6747	nan	0.6875	bm25s	False
FinQARetrieval	0.6464	0.8865	nan	0.8552	voyageai/voyage-3.5 (output_dtype=int8)	False
FinanceBenchRetrieval	0.9157	0.9288	nan	0.9459	bflhc/Octen-Embedding-8B	False
French1Retrieval	0.8781	0.8587	nan	0.8884	Cohere/Cohere-embed-v4.0	False
FrenchLegal1Retrieval	0.8696	0.9362	nan	0.9490	bm25s	False
FreshStackRetrieval	0.3979	0.4933	0.2519	0.5776	bflhc/Octen-Embedding-8B	False
German1Retrieval	0.9761	0.9762	nan	0.9771	voyageai/voyage-3-large	False
GermanHealthcare1Retrieval	0.8742	0.9140	nan	0.8850	bflhc/Octen-Embedding-8B	False
GermanLegal1Retrieval	0.7149	0.7554	nan	0.7405	voyageai/voyage-3-large	False
HC3FinanceRetrieval	0.7758	0.7671	nan	0.8242	nvidia/NV-Embed-v2	False
HumanEvalRetrieval	0.9910	0.9957	nan	0.9977	bflhc/Octen-Embedding-8B	False
JapaneseCode1Retrieval	0.8650	0.8615	nan	0.8650	google/gemini-embedding-001	False
JapaneseLegal1Retrieval	0.9228	0.8567	nan	0.9228	google/gemini-embedding-001	False
LegalQuAD	0.6553	0.7211	0.4317	0.7675	bm25s	False
LegalSummarization	0.7122	0.7836	0.621	0.7921	voyageai/voyage-3.5	False
MBPPRetrieval	0.9416	0.9588	nan	0.9416	google/gemini-embedding-001	False
MIRACLRetrievalHardNegatives	0.7042	0.6213	0.5923	0.7305	nvidia/llama-embed-nemotron-8b	False
WikiSQLRetrieval	0.8814	0.9621	nan	0.9892	bflhc/Octen-Embedding-8B	False
Average	0.7602	0.7984	0.42	0.8303	nan	-

Model have high performance on these tasks: AppsRetrieval,MBPPRetrieval,EnglishFinance2Retrieval,GermanHealthcare1Retrieval,FinQARetrieval,EnglishFinance1Retrieval,GermanLegal1Retrieval,ChatDoctorRetrieval,DS1000Retrieval,CUREv1,EnglishFinance4Retrieval

Results for `voyageai/voyage-4-lite`

task_name	google/gemini-embedding-001	voyageai/voyage-4-lite	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
AILACasedocs	0.4833	0.4188	0.2643	0.6541	bflhc/Octen-Embedding-8B	False
AILAStatutes	0.4877	0.4849	0.2084	0.9313	bflhc/Octen-Embedding-8B	False
AppsRetrieval	0.9375	0.8573	0.3255	0.9463	voyageai/voyage-3-large	False
CUREv1	0.5957	0.5324	0.5162	0.6590	bflhc/Octen-Embedding-4B	False
ChatDoctorRetrieval	0.7352	0.7038	0.5687	0.7390	voyageai/voyage-3-large	False
Code1Retrieval	0.9474	0.9376	nan	0.9474	google/gemini-embedding-001	False
DS1000Retrieval	0.6870	0.6646	nan	0.7103	bflhc/Octen-Embedding-8B	False
EnglishFinance1Retrieval	0.7332	0.7722	nan	0.8188	voyageai/voyage-3.5	False
EnglishFinance2Retrieval	0.6740	0.8779	nan	0.8851	voyageai/voyage-3.5	False
EnglishFinance3Retrieval	0.8330	0.7643	nan	0.8509	nvidia/NV-Embed-v2	False
EnglishFinance4Retrieval	0.5757	0.5646	nan	0.5997	voyageai/voyage-3-large	False
EnglishHealthcare1Retrieval	0.6338	0.609	nan	0.6875	bm25s	False
FinQARetrieval	0.6464	0.8383	nan	0.8552	voyageai/voyage-3.5 (output_dtype=int8)	False
FinanceBenchRetrieval	0.9157	0.9148	nan	0.9459	bflhc/Octen-Embedding-8B	False
French1Retrieval	0.8781	0.845	nan	0.8884	Cohere/Cohere-embed-v4.0	False
FrenchLegal1Retrieval	0.8696	0.9213	nan	0.9490	bm25s	False
FreshStackRetrieval	0.3979	0.4365	0.2519	0.5776	bflhc/Octen-Embedding-8B	False
German1Retrieval	0.9761	0.9658	nan	0.9771	voyageai/voyage-3-large	False
GermanHealthcare1Retrieval	0.8742	0.8319	nan	0.8850	bflhc/Octen-Embedding-8B	False
GermanLegal1Retrieval	0.7149	0.7255	nan	0.7405	voyageai/voyage-3-large	False
HC3FinanceRetrieval	0.7758	0.6793	nan	0.8242	nvidia/NV-Embed-v2	False
HumanEvalRetrieval	0.9910	0.9907	nan	0.9977	bflhc/Octen-Embedding-8B	False
JapaneseCode1Retrieval	0.8650	0.8051	nan	0.8650	google/gemini-embedding-001	False
JapaneseLegal1Retrieval	0.9228	0.7775	nan	0.9228	google/gemini-embedding-001	False
LegalQuAD	0.6553	0.6772	0.4317	0.7675	bm25s	False
LegalSummarization	0.7122	0.7264	0.621	0.7921	voyageai/voyage-3.5	False
MBPPRetrieval	0.9416	0.9182	nan	0.9416	google/gemini-embedding-001	False
MIRACLRetrievalHardNegatives	0.7042	0.5712	0.5923	0.7305	nvidia/llama-embed-nemotron-8b	False
WikiSQLRetrieval	0.8814	0.9666	nan	0.9892	bflhc/Octen-Embedding-8B	False
Average	0.7602	0.751	0.42	0.8303	nan	-

Samoed · 2026-01-07T12:44:47Z

Can you add implementations of these models to mteb?

fzoll · 2026-01-07T15:42:25Z

@Samoed I added the models: embeddings-benchmark/mteb#3885

KennethEnevoldsen · 2026-01-07T19:05:33Z

@fzoll we see quite large improvements on a few of the datasets - I just need a confirmation that you haven't trained on these, intentional or by accident?

fzoll · 2026-01-08T10:29:26Z

@KennethEnevoldsen I can confirm. I discussed with the team and the improvements come from the use of MoE.

KennethEnevoldsen · 2026-01-09T10:40:42Z

This is currently waiting on: embeddings-benchmark/mteb#3901

…licly)

fzoll · 2026-01-11T12:54:25Z

@KennethEnevoldsen I removed the voyage-4 model results as the tokenizer is not available for that model yet. (voyage-4-large and voyage-4-lite should work now.)

fzoll · 2026-01-13T12:10:38Z

@KennethEnevoldsen, can you please merge the results as well, as the model implementation is merged?

KennethEnevoldsen · 2026-01-13T12:19:44Z

Given the current suggestions on embeddings-benchmark/mteb#3902 and that I have reproduced some of the results using the current model implementations, I don't see an issue merging this in.

Adding results for the voyage-4-large and voyage-4-lite models

4109c7c

KennethEnevoldsen approved these changes Jan 9, 2026

View reviewed changes

bflhc mentioned this pull request Jan 9, 2026

Temporarily remove the private column on RTEB embeddings-benchmark/mteb#3902

Closed

Adding results for the voyageai/voyage-4 model

3f0768b

fzoll changed the title ~~Adding results for the voyage-4-large and voyage-4-lite models~~ Adding results for the voyage-4-large, voyage-4 and voyage-4-lite models Jan 10, 2026

Reverting voyage-4 results (as the tokenizer is not yet available pub…

2acff85

…licly)

fzoll changed the title ~~Adding results for the voyage-4-large, voyage-4 and voyage-4-lite models~~ Adding results for the voyage-4-large and voyage-4-lite models Jan 11, 2026

KennethEnevoldsen mentioned this pull request Jan 11, 2026

Adding voyage-4-large, voyage-4 and voyage-4-lite embeddings-benchmark/mteb#3885

Merged

KennethEnevoldsen merged commit 3c38619 into embeddings-benchmark:main Jan 13, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Adding results for the voyage-4-large and voyage-4-lite models#391

Adding results for the voyage-4-large and voyage-4-lite models#391
KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
fzoll:voyage-4

fzoll commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Samoed commented Jan 7, 2026

Uh oh!

fzoll commented Jan 7, 2026

Uh oh!

KennethEnevoldsen commented Jan 7, 2026

Uh oh!

fzoll commented Jan 8, 2026

Uh oh!

KennethEnevoldsen commented Jan 9, 2026

Uh oh!

fzoll commented Jan 11, 2026

Uh oh!

fzoll commented Jan 13, 2026

Uh oh!

KennethEnevoldsen commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

fzoll commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Results Comparison

Results for voyageai/voyage-4-large

Results for voyageai/voyage-4-lite

Uh oh!

Samoed commented Jan 7, 2026

Uh oh!

fzoll commented Jan 7, 2026

Uh oh!

KennethEnevoldsen commented Jan 7, 2026

Uh oh!

fzoll commented Jan 8, 2026

Uh oh!

KennethEnevoldsen commented Jan 9, 2026

Uh oh!

fzoll commented Jan 11, 2026

Uh oh!

fzoll commented Jan 13, 2026

Uh oh!

KennethEnevoldsen commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fzoll commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

Results for `voyageai/voyage-4-large`

Results for `voyageai/voyage-4-lite`