Adding results for the voyage-4-large and voyage-4-lite models#391
Adding results for the voyage-4-large and voyage-4-lite models#391KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | google/gemini-embedding-001 | voyageai/voyage-4-large | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AILACasedocs | 0.4833 | 0.4575 | 0.2643 | 0.6541 | bflhc/Octen-Embedding-8B | False |
| AILAStatutes | 0.4877 | 0.5002 | 0.2084 | 0.9313 | bflhc/Octen-Embedding-8B | False |
| AppsRetrieval | 0.9375 | 0.9722 | 0.3255 | 0.9463 | voyageai/voyage-3-large | False |
| CUREv1 | 0.5957 | 0.6694 | 0.5162 | 0.6590 | bflhc/Octen-Embedding-4B | False |
| ChatDoctorRetrieval | 0.7352 | 0.7674 | 0.5687 | 0.7390 | voyageai/voyage-3-large | False |
| Code1Retrieval | 0.9474 | 0.9433 | nan | 0.9474 | google/gemini-embedding-001 | False |
| DS1000Retrieval | 0.6870 | 0.7117 | nan | 0.7103 | bflhc/Octen-Embedding-8B | False |
| EnglishFinance1Retrieval | 0.7332 | 0.8218 | nan | 0.8188 | voyageai/voyage-3.5 | False |
| EnglishFinance2Retrieval | 0.6740 | 0.9099 | nan | 0.8851 | voyageai/voyage-3.5 | False |
| EnglishFinance3Retrieval | 0.8330 | 0.8278 | nan | 0.8509 | nvidia/NV-Embed-v2 | False |
| EnglishFinance4Retrieval | 0.5757 | 0.6198 | nan | 0.5997 | voyageai/voyage-3-large | False |
| EnglishHealthcare1Retrieval | 0.6338 | 0.6747 | nan | 0.6875 | bm25s | False |
| FinQARetrieval | 0.6464 | 0.8865 | nan | 0.8552 | voyageai/voyage-3.5 (output_dtype=int8) | False |
| FinanceBenchRetrieval | 0.9157 | 0.9288 | nan | 0.9459 | bflhc/Octen-Embedding-8B | False |
| French1Retrieval | 0.8781 | 0.8587 | nan | 0.8884 | Cohere/Cohere-embed-v4.0 | False |
| FrenchLegal1Retrieval | 0.8696 | 0.9362 | nan | 0.9490 | bm25s | False |
| FreshStackRetrieval | 0.3979 | 0.4933 | 0.2519 | 0.5776 | bflhc/Octen-Embedding-8B | False |
| German1Retrieval | 0.9761 | 0.9762 | nan | 0.9771 | voyageai/voyage-3-large | False |
| GermanHealthcare1Retrieval | 0.8742 | 0.9140 | nan | 0.8850 | bflhc/Octen-Embedding-8B | False |
| GermanLegal1Retrieval | 0.7149 | 0.7554 | nan | 0.7405 | voyageai/voyage-3-large | False |
| HC3FinanceRetrieval | 0.7758 | 0.7671 | nan | 0.8242 | nvidia/NV-Embed-v2 | False |
| HumanEvalRetrieval | 0.9910 | 0.9957 | nan | 0.9977 | bflhc/Octen-Embedding-8B | False |
| JapaneseCode1Retrieval | 0.8650 | 0.8615 | nan | 0.8650 | google/gemini-embedding-001 | False |
| JapaneseLegal1Retrieval | 0.9228 | 0.8567 | nan | 0.9228 | google/gemini-embedding-001 | False |
| LegalQuAD | 0.6553 | 0.7211 | 0.4317 | 0.7675 | bm25s | False |
| LegalSummarization | 0.7122 | 0.7836 | 0.621 | 0.7921 | voyageai/voyage-3.5 | False |
| MBPPRetrieval | 0.9416 | 0.9588 | nan | 0.9416 | google/gemini-embedding-001 | False |
| MIRACLRetrievalHardNegatives | 0.7042 | 0.6213 | 0.5923 | 0.7305 | nvidia/llama-embed-nemotron-8b | False |
| WikiSQLRetrieval | 0.8814 | 0.9621 | nan | 0.9892 | bflhc/Octen-Embedding-8B | False |
| Average | 0.7602 | 0.7984 | 0.42 | 0.8303 | nan | - |
Model have high performance on these tasks: AppsRetrieval,MBPPRetrieval,EnglishFinance2Retrieval,GermanHealthcare1Retrieval,FinQARetrieval,EnglishFinance1Retrieval,GermanLegal1Retrieval,ChatDoctorRetrieval,DS1000Retrieval,CUREv1,EnglishFinance4Retrieval
Results for voyageai/voyage-4-lite
| task_name | google/gemini-embedding-001 | voyageai/voyage-4-lite | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AILACasedocs | 0.4833 | 0.4188 | 0.2643 | 0.6541 | bflhc/Octen-Embedding-8B | False |
| AILAStatutes | 0.4877 | 0.4849 | 0.2084 | 0.9313 | bflhc/Octen-Embedding-8B | False |
| AppsRetrieval | 0.9375 | 0.8573 | 0.3255 | 0.9463 | voyageai/voyage-3-large | False |
| CUREv1 | 0.5957 | 0.5324 | 0.5162 | 0.6590 | bflhc/Octen-Embedding-4B | False |
| ChatDoctorRetrieval | 0.7352 | 0.7038 | 0.5687 | 0.7390 | voyageai/voyage-3-large | False |
| Code1Retrieval | 0.9474 | 0.9376 | nan | 0.9474 | google/gemini-embedding-001 | False |
| DS1000Retrieval | 0.6870 | 0.6646 | nan | 0.7103 | bflhc/Octen-Embedding-8B | False |
| EnglishFinance1Retrieval | 0.7332 | 0.7722 | nan | 0.8188 | voyageai/voyage-3.5 | False |
| EnglishFinance2Retrieval | 0.6740 | 0.8779 | nan | 0.8851 | voyageai/voyage-3.5 | False |
| EnglishFinance3Retrieval | 0.8330 | 0.7643 | nan | 0.8509 | nvidia/NV-Embed-v2 | False |
| EnglishFinance4Retrieval | 0.5757 | 0.5646 | nan | 0.5997 | voyageai/voyage-3-large | False |
| EnglishHealthcare1Retrieval | 0.6338 | 0.609 | nan | 0.6875 | bm25s | False |
| FinQARetrieval | 0.6464 | 0.8383 | nan | 0.8552 | voyageai/voyage-3.5 (output_dtype=int8) | False |
| FinanceBenchRetrieval | 0.9157 | 0.9148 | nan | 0.9459 | bflhc/Octen-Embedding-8B | False |
| French1Retrieval | 0.8781 | 0.845 | nan | 0.8884 | Cohere/Cohere-embed-v4.0 | False |
| FrenchLegal1Retrieval | 0.8696 | 0.9213 | nan | 0.9490 | bm25s | False |
| FreshStackRetrieval | 0.3979 | 0.4365 | 0.2519 | 0.5776 | bflhc/Octen-Embedding-8B | False |
| German1Retrieval | 0.9761 | 0.9658 | nan | 0.9771 | voyageai/voyage-3-large | False |
| GermanHealthcare1Retrieval | 0.8742 | 0.8319 | nan | 0.8850 | bflhc/Octen-Embedding-8B | False |
| GermanLegal1Retrieval | 0.7149 | 0.7255 | nan | 0.7405 | voyageai/voyage-3-large | False |
| HC3FinanceRetrieval | 0.7758 | 0.6793 | nan | 0.8242 | nvidia/NV-Embed-v2 | False |
| HumanEvalRetrieval | 0.9910 | 0.9907 | nan | 0.9977 | bflhc/Octen-Embedding-8B | False |
| JapaneseCode1Retrieval | 0.8650 | 0.8051 | nan | 0.8650 | google/gemini-embedding-001 | False |
| JapaneseLegal1Retrieval | 0.9228 | 0.7775 | nan | 0.9228 | google/gemini-embedding-001 | False |
| LegalQuAD | 0.6553 | 0.6772 | 0.4317 | 0.7675 | bm25s | False |
| LegalSummarization | 0.7122 | 0.7264 | 0.621 | 0.7921 | voyageai/voyage-3.5 | False |
| MBPPRetrieval | 0.9416 | 0.9182 | nan | 0.9416 | google/gemini-embedding-001 | False |
| MIRACLRetrievalHardNegatives | 0.7042 | 0.5712 | 0.5923 | 0.7305 | nvidia/llama-embed-nemotron-8b | False |
| WikiSQLRetrieval | 0.8814 | 0.9666 | nan | 0.9892 | bflhc/Octen-Embedding-8B | False |
| Average | 0.7602 | 0.751 | 0.42 | 0.8303 | nan | - |
|
Can you add implementations of these models to mteb? |
|
@Samoed I added the models: embeddings-benchmark/mteb#3885 |
|
@fzoll we see quite large improvements on a few of the datasets - I just need a confirmation that you haven't trained on these, intentional or by accident? |
|
@KennethEnevoldsen I can confirm. I discussed with the team and the improvements come from the use of MoE. |
|
This is currently waiting on: embeddings-benchmark/mteb#3901 |
|
@KennethEnevoldsen I removed the |
|
@KennethEnevoldsen, can you please merge the results as well, as the model implementation is merged? |
|
Given the current suggestions on embeddings-benchmark/mteb#3902 and that I have reproduced some of the results using the current model implementations, I don't see an issue merging this in. |
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here