refactor: split `BRIGHT` benchmark into individual subset tasks by whybe-choi · Pull Request #3285 · embeddings-benchmark/mteb

whybe-choi · 2025-10-07T13:20:42Z

This pull request adds new BRIGHT subset benchmarks and their corresponding descriptive statistics to the retrieval benchmark suite. These changes enable more granular, domain-specific evaluation for reasoning-intensive retrieval tasks, both for standard and long document formats.

Benchmark additions

Introduced two new benchmarks, BRIGHT_SUBSETS and BRIGHT_SUBSETS_LONG, to the mteb/benchmarks/benchmarks/benchmarks.py file, covering individual domains of the BRIGHT benchmark for both standard and long document retrieval tasks. [1] [2]
Registered the new benchmarks in the mteb/benchmarks/benchmarks/__init__.py file for import and usage. [1] [2]

Descriptive statistics

Added descriptive statistics JSON files for each new BRIGHT subset retrieval task, including both standard and long formats (e.g., BrightBiologyRetrieval.json, BrightBiologyLongRetrieval.json, etc.), detailing sample counts, text lengths, and relevant document statistics for each domain. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

Minor improvement

Minor formatting fix in the BEIR_NL benchmark description for improved readability.

mteb/benchmarks/benchmarks/benchmarks.py

mteb/tasks/Retrieval/eng/BrightRetrieval.py

KennethEnevoldsen

Hmm, this change will invalidate all previous results on BRIGHT.

You know that you can also simply subselect from a task using:

task = mteb.get_task("BrightRetrieval", eval_splits=..., hf_subet=...)

For the leaderboard display it is even possible to create custom summary tables (see e.g. #3272)

Samoed · 2025-10-07T14:48:10Z

You know that you can also simply subselect from a task using:

Yes, but BRIGHT requires different prompts for different subsets, and because of that we probably need to split it. We can add support to configure prompts for different subsets, but I'm not sure if it good idea

KennethEnevoldsen · 2025-10-07T15:07:28Z

Yes, but BRIGHT requires different prompts for different subsets, and because of that we probably need to split it. We can add support to configure prompts for different subsets, but I'm not sure if it good idea

Ohh... Yeah that is hard to fix.

I see that the original BRIGHT(long) only has four models and BRIGHT only has 12, so I guess it is possible to rerun them

Muennighoff · 2025-10-07T15:45:39Z

If the scores change, are the new scores more similar or more different from the official scores? If closer then I think it is fine & maybe we can rerun some models. I think that for many models on our BRIGHT leaderboard I just converted the scores from https://brightbenchmark.github.io/ to MTEB format when we originally added so they may be still fine if these changes actually make our implementation closer to that one.

whybe-choi · 2025-10-08T00:41:16Z

Would it be enough to evaluate the performance of ReasonIR, or is there a list of other models that would be good enough to test?

Samoed · 2025-10-08T07:18:58Z

To check implementation, this will be enough, just don't update old leaderboard

whybe-choi · 2025-10-09T07:46:48Z

After split BrightRetrieval into multiple tasks, I ran ReasonIR on them with task-specific prompts using the following code:

import torch
import mteb

# https://github.com/facebookresearch/ReasonIR/tree/main/evaluation/bright/configs/reasonir
prompts_dict = {
    "BrightBiologyRetrieval": "Given a Biology post, retrieve relevant passages that help answer the post",
    "BrightEarthScienceRetrieval": "Given a Earth Science post, retrieve relevant passages that help answer the post",
    "BrightEconomicsRetrieval": "Given a Economics post, retrieve relevant passages that help answer the post",
    "BrightPsychologyRetrieval": "Given a Psychology post, retrieve relevant passages that help answer the post",
    "BrightRoboticsRetrieval": "Given a Robotics post, retrieve relevant passages that help answer the post",
    "BrightStackoverflowRetrieval": "Given a Stackoverflow post, retrieve relevant passages that help answer the post",
    "BrightSustainableLivingRetrieval": "Given a Sustainable Living post, retrieve relevant passages that help answer the post",
    "BrightPonyRetrieval": "Given a Pony question, retrieve relevant passages that help answer the question",
    "BrightLeetcodeRetrieval": "Given a coding problem, retrieve relevant examples that help answer the problem",
    "BrightAopsRetrieval": "Given a Math problem, retrieve relevant examples that help answer the problem",
    "BrightTheoremQATheoremsRetrieval": "Given a Math problem, retrieve relevant theorems that help answer the problem",
    "BrightTheoremQAQuestionsRetrieval": "Given a Math problem, retrieve relevant examples that help answer the problem",
}

tasks = mteb.get_tasks(tasks=list(prompts_dict.keys()), languages=["eng"])
evaluation = mteb.MTEB(tasks=tasks)

model = mteb.get_model(
    "ReasonIR/ReasonIR-8B",
    model_kwargs={"torch_dtype": torch.bfloat16},
    prompts_dict=prompts_dict,
)

evaluation.run(
    model,
    save_predictions=True,
    output_folder="evaluation/results",
    encode_kwargs={"batch_size": 1},
)

The results are as follows:

	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.	Avg.
before split	24.31	30.83	24.27	28.95	18.40	21.68	20.57	18.14	9.49	4.84	18.21	26.42	20.51
after split	26.18	30.71	23.96	29.76	18.62	21.15	19.89	19.65	9.22	5.12	18.34	27.12	20.81

In the paper:

Samoed · 2025-10-09T07:54:26Z

Great results! But I'm a bit unsure does prompts applied correctly when they're passing thought get_model?

whybe-choi · 2025-10-09T08:08:43Z

mteb/mteb/models/instruct_wrapper.py

Lines 158 to 171 in d2c704c

    
           if instruction: 
        
               logger.info(f"Using instruction: '{instruction}' for task: '{task_name}'") 
        
           embeddings = self.model.encode( 
        
               sentences, 
        
               prompt=instruction, 
        
               **kwargs, 
        
           ) 
        
           if isinstance(embeddings, torch.Tensor): 
        
               # sometimes in kwargs can be return_tensors=True 
        
               embeddings = embeddings.cpu().detach().float().numpy() 
        
           return embeddings

After adding code to print the instruction inside the code, the following output was produced:

# Biology
Retrieval
    - BrightBiologyRetrieval, s2p


instruction: <|user|>
Given a Biology post, retrieve relevant passages that help answer the post<|embed|>

Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 103/103 [00:06<00:00, 15.80it/s]
instruction: <|embed|>

Batches:   0%|                                                                                            | 2/50000 [00:02<18:01:38,  1.30s/it

# Psychology
Retrieval
    - BrightPsychologyRetrieval, s2p


instruction: <|user|>
Given a Psychology post, retrieve relevant passages that help answer the post<|embed|>

Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 101/101 [00:07<00:00, 14.12it/s]
instruction: <|embed|>

Batches:   0%|                                                                                                       | 0/50000 [00:01<?, ?it/s]

# Aops
Retrieval
    - BrightAopsRetrieval, s2p


instruction: <|user|>
Given a Math problem, retrieve relevant examples that help answer the problem<|embed|>

Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:06<00:00, 16.13it/s]
instruction: <|embed|>

Batches:   0%|                                                                                            | 17/50000 [00:09<7:16:33,  1.91it/s]

Samoed · 2025-10-09T08:38:26Z

Interesting, thanks! I didn’t think that would work since it’s a bit unintended, but maybe we should update the code to handle this case.

I've checked code for ReasonIR and found some other places that can help to reproduce:

For some cases, rewritten query is concatenated with query https://github.com/facebookresearch/ReasonIR/blob/0aac96269e455965949df16520fab72da68ffc22/evaluation/bright/run.py#L82-L87
Sometimes reason trases added to the query https://github.com/facebookresearch/ReasonIR/blob/0aac96269e455965949df16520fab72da68ffc22/evaluation/bright/run.py#L124
Maybe ids can be filtered (ref Excluded IDs missing from BRIGHT dataset #2696) but in ReasonIR code they're just check that no ids are intersect https://github.com/facebookresearch/ReasonIR/blob/0aac96269e455965949df16520fab72da68ffc22/evaluation/bright/run.py#L130-L131

@Muennighoff Can you help what we can do to reproduce results?

Muennighoff · 2025-10-09T18:27:28Z

I think the IDs filtering is probably the main missing piece to fully reproduce results?

whybe-choi · 2025-10-10T07:48:08Z

I think points 1 and 2 are a separate issue, as they are related to query expansion. The problem of the performance not being reproducible in the single ReasonIR model seems to be related to the issue mentioned in point 3.

# Conflicts: # mteb/benchmarks/benchmarks/__init__.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/retrieval/eng/BrightSubsetsLongRetrieval.py # mteb/tasks/retrieval/eng/BrightSubsetsRetrieval.py

whybe-choi · 2025-10-22T14:30:13Z

@Samoed

I think it would be better to close this PR and work on it later together with Excluded IDs missing from BRIGHT dataset #2696. Also, should revise it to fit the v2 format and include descriptive stats as well. What do you think?

Samoed · 2025-10-22T14:44:04Z

I think it would be better to close this PR and work on it later together

Do you mean that you don't want tasks in this pr and will add another PR for #2696?

Also, should revise it to fit the v2 format and include descriptive stats as well. What do you think?

Yes, you need to add statistic to merge. To apply v2 format, you can select subsets from https://huggingface.co/datasets/mteb/BrightRetrieval, but retrieval dataset loader reqired dataset to have strictly corpus, qrels and quries, maybe we need to reupload them instead

whybe-choi · 2025-10-22T15:53:38Z

What tasks need to be redone for this PR? I'm confused about the changes with the v2 format, so I would appreciate your help.

Samoed · 2025-10-22T16:18:23Z

I think we can solve #2696 in this pr, because otherwise we would need to create v2 versions of these tasks, which I think is not good solution

Samoed · 2025-10-22T16:20:34Z

mteb/tasks/retrieval/eng/bright_subsets_long_retrieval.py

+    domain_corpus_long = datasets.load_dataset(
+        path,
+        "long_documents",
+        split=domain,
+        cache_dir=cache_dir,
+        revision=revision,
+    )
+    examples = datasets.load_dataset(
+        path,
+        "examples",
+        split=domain,
+        cache_dir=cache_dir,
+        revision=revision,
+    )
+    corpus["long"] = {e["id"]: {"text": e["content"]} for e in domain_corpus_long}
+    queries["long"] = {e["id"]: e["query"] for e in examples}
+    relevant_docs["long"] = defaultdict(dict)


To follow v2 format, you can remove conversion dataset to dict and pass dataset directly.

domain_corpus_long = domain_corpus_long.rename_column("content", "text") queries = queries.rename_column("query", "text") ... return domain_corpus_long, queires, relevant_docs

Samoed · 2025-10-22T16:22:03Z

mteb/tasks/retrieval/eng/bright_subset_long_retrieval.py

+        if self.data_loaded:
+            return
+
+        self.corpus, self.queries, self.relevant_docs = load_bright_long_data(


And then here it should look like

self.dataset["default"]["long"]["corpus"], self.dataset["default"]["long"]["queries"], self.dataset["default"]["long"]["relevant_documents"]

You can refer to

mteb/mteb/abstasks/retrieval_dataset_loaders.py

Lines 25 to 38 in 0ead029

class RetrievalSplitData(TypedDict):

"""A dictionary containing the corpus, queries, relevant documents, instructions, and top-ranked documents for a retrieval task.

Attributes:

corpus: The corpus dataset containing documents. Should have columns `id`, `title`, `text` or `image`.

queries: The queries dataset containing queries. Should have columns `id`, `text`, `instruction` (for instruction retrieval/reranking) or `image`.

relevant_docs: A mapping of query IDs to relevant document IDs and their relevance scores. Should have columns `query-id`, `corpus-id`, `score`.

top_ranked: A mapping of query IDs to a list of top-ranked document IDs. Should have columns `query-id`, `corpus-ids` (list[str]). This is optional and used for reranking tasks.

"""

corpus: CorpusDatasetType

queries: QueryDatasetType

relevant_docs: RelevantDocumentsType

top_ranked: TopRankedDocumentsType | None

Samoed · 2025-12-28T12:45:57Z

I run with bm25 and got these results. Overall they're showing same difference as in #3285 (comment). So problem totally in data somewhere. I will continue debugging later

task_name	bm25s (mteb)	bm25 (bright)	diff	matching
BrightAopsRetrieval	0.03155	0.0498	0.0182	-
BrightBiologyRetrieval	0.08034	0.0774	0.0029	-
BrightEarthScienceRetrieval	0.12803	0.1283	0.0003	+
BrightEconomicsRetrieval	0.1033	0.1026	0.0007	+
BrightLeetcodeRetrieval	0.14568	0.2532	0.1075	-
BrightPonyRetrieval	0.02222	0.0226	0.0004	+
BrightPsychologyRetrieval	0.07645	0.0765	0.0001	+
BrightRoboticsRetrieval	0.06229	0.0623	0.0000	+
BrightStackoverflowRetrieval	0.14359	0.1636	0.0200	-
BrightSustainableLivingRetrieval	0.08385	0.0851	0.0013	-
BrightTheoremQAQuestionsRetrieval	0.05413	0.0735	0.0194	-
BrightTheoremQATheoremsRetrieval	0.01566	0.0051	0.0106	-
BrightBiologyLongRetrieval	0.09547	0.0955	0.0000	+
BrightEarthScienceLongRetrieval	0.17816	0.1782	0.0000	+
BrightEconomicsLongRetrieval	0.15372	0.1537	0.0000	+
BrightPonyLongRetrieval	0.02496	0.0250	0.0000	+
BrightPsychologyLongRetrieval	0.1703	0.1703	0.0000	+
BrightRoboticsLongRetrieval	0.05941	0.0594	0.0000	+
BrightStackoverflowLongRetrieval	0.23504	0.2350	0.0000	+
BrightSustainableLivingLongRetrieval	0.18519	0.1852	0.0000	+

Script to use bm25s in BRIGHT

def retrieval_bm25(queries, query_ids, documents, doc_ids, excluded_ids, long_context, **kwargs):
    import bm25s
    import Stemmer

    stemmer_language = 'english'
    stopwords = 'en'
    stemmer = Stemmer.Stemmer(stemmer_language)
    encoded_corpus = bm25s.tokenize(documents, stopwords=stopwords, stemmer=stemmer)

    retriever = bm25s.BM25()
    retriever.index(encoded_corpus)

    corpus_idx_to_id = {i: doc_id for i, doc_id in enumerate(doc_ids)}
    query_token_strs = bm25s.tokenize(queries, stopwords=stopwords, stemmer=stemmer)

    queries_results, queries_scores = retriever.retrieve(
        query_token_strs,
        k=min(1000, len(corpus_idx_to_id))
    )

    all_scores = {}
    for qi, query_id in enumerate(query_ids):
        query_results = queries_results[qi]
        scores = queries_scores[qi]
        all_scores[str(query_id)] = {}
        for ri in range(len(query_results)):
            doc_idx = query_results[ri]
            score = scores[ri]
            doc_id = corpus_idx_to_id[doc_idx]
            all_scores[str(query_id)][str(doc_id)] = float(score)
        for did in set(excluded_ids[str(query_id)]):
            if did != "N/A" and did in all_scores[str(query_id)]:
                all_scores[str(query_id)].pop(did)
        cur_scores = sorted(all_scores[str(query_id)].items(), key=lambda x: x[1], reverse=True)[:1000]
        all_scores[str(query_id)] = {pair[0]: pair[1] for pair in cur_scores}
    return all_scores

Samoed · 2025-12-28T19:23:59Z

I found a source of problem

mteb/mteb/abstasks/retrieval.py

Lines 344 to 348 in 42dea01

    
           data_split["relevant_docs"], data_split["queries"] = ( 
        
               _filter_queries_without_positives( 
        
                   data_split["relevant_docs"], data_split["queries"] 
        
               ) 
        
           )

If I remove filtering, then I'm getting the same metrics for bm25. Should we can option to disable filtering to reproduce old results, or we just leave as is? WDYT @KennethEnevoldsen @Muennighoff?

mteb/models/model_implementations/bge_models.py

KennethEnevoldsen · 2026-01-13T12:41:26Z

If I remove filtering, then I'm getting the same metrics for bm25. Should we can option to disable filtering to reproduce old results, or we just leave as is? WDYT @KennethEnevoldsen @Muennighoff?

Wouldn't we rather reproduce the old results and then potentially version bump it and add the filtering?

Samoed · 2026-01-13T12:45:34Z

I've run tasks and without this filtering I can reproduce scores from paper. I'm not sure how version bump would help us

KennethEnevoldsen · 2026-01-13T13:06:30Z

So, if I understand correctly, we have the choices between reproducing paper scores or the previous implementation?

Then I would target the paper and relabel it as a bugfix.

Samoed · 2026-01-14T19:42:33Z

I rerun BRIGHT and now results are matching!

task_name	bm25s (mteb)	bm25 (bright)
BrightAopsRetrieval	0.0498	0.0498
BrightBiologyRetrieval	0.07736	0.0774
BrightEarthScienceRetrieval	0.12825	0.1283
BrightEconomicsRetrieval	0.10258	0.1026
BrightLeetcodeRetrieval	0.25321	0.2532
BrightPonyRetrieval	0.02264	0.0226
BrightPsychologyRetrieval	0.07645	0.0765
BrightRoboticsRetrieval	0.06229	0.0623
BrightStackoverflowRetrieval	0.16358	0.1636
BrightSustainableLivingRetrieval	0.08507	0.0851
BrightTheoremQAQuestionsRetrieval	0.07349	0.0735
BrightTheoremQATheoremsRetrieval	0.00509	0.0051
BrightBiologyLongRetrieval	0.09547	0.0955
BrightEarthScienceLongRetrieval	0.17816	0.1782
BrightEconomicsLongRetrieval	0.15372	0.1537
BrightPonyLongRetrieval	0.02496	0.0250
BrightPsychologyLongRetrieval	0.1703	0.1703
BrightRoboticsLongRetrieval	0.05941	0.0594
BrightStackoverflowLongRetrieval	0.23504	0.2350
BrightSustainableLivingLongRetrieval	0.18519	0.1852

I changed:

Updated revision of datasets to the latest. Seems that our revision was beteween data uploading and datataset wasn't fully finish transition. theoremqa_theorems was updated after this commit
bm25 didn't support reranking tasks (used all documents for searching)

Also I've splitted BRIGHT v1.1 on short and long subsets, but maybe we can convert them back. I don't know why I thought that problem was in filtering. I will try to evalute bge model to check instructions

KennethEnevoldsen · 2026-01-14T20:05:19Z

mteb/benchmarks/benchmarks/benchmarks.py

 """,
 )

+BRIGHT_V1_1 = Benchmark(


Shouldn't we just combine this into one table with both long and short as two different columns (we can also have different columns for the different domains)

Feel free to delete the benchmark here and add that in a separate PR.

Samoed · 2026-01-18T06:45:34Z

Short Bright

MTEB

task_name	BrightBiologyRetrieval	BrightEarthScienceRetrieval	BrightEconomicsRetrieval	BrightPsychologyRetrieval	BrightRoboticsRetrieval	BrightStackoverflowRetrieval	BrightSustainableLivingRetrieval	BrightLeetcodeRetrieval	BrightPonyRetrieval	BrightAopsRetrieval	BrightTheoremQAQuestionsRetrieval	BrightTheoremQATheoremsRetrieval
BAAI/bge-large-en-v1.5	0.1167	0.24558	0.16605	0.17464	0.11713	0.1083	0.13326	0.26681	0.05724	0.06	0.13057	0.069
sentence-transformers/all-mpnet-base-v2	0.15098	0.20414	0.16639	0.22664	0.08221	0.11024	0.15343	0.26404	0.06996	0.05325	0.20035	0.10779

Bright

Model	BrightBiologyRetrieval	BrightEarthScienceRetrieval	BrightEconomicsRetrieval	BrightPsychologyRetrieval	BrightRoboticsRetrieval	BrightStackoverflowRetrieval	BrightSustainableLivingRetrieval	BrightLeetcodeRetrieval	BrightPonyRetrieval	BrightAopsRetrieval	BrightTheoremQAQuestionsRetrieval	BrightTheoremQATheoremsRetrieval
BAAI/bge-large-en-v1.5	11.7	24.6	16.6	17.5	11.7	10.8	13.3	26.7	5.7	6.0	13.0	6.9
sentence-transformers/all-mpnet-base-v2	15.1	20.4	16.6	22.7	8.2	11.0	15.3	26.4	7.0	5.3	20.0	10.8

Long Bright

MTEB

task_name	BrightBiologyLongRetrieval	BrightEarthScienceLongRetrieval	BrightEconomicsLongRetrieval	BrightPsychologyLongRetrieval	BrightRoboticsLongRetrieval	BrightStackoverflowLongRetrieval	BrightSustainableLivingLongRetrieval	BrightPonyLongRetrieval
BAAI/bge-large-en-v1.5	0.16424	0.2773	0.20874	0.11584	0.10891	0.13248	0.16898	0.0036
sentence-transformers/all-mpnet-base-v2	0.25566	0.34052	0.18932	0.15842	0.10891	0.14957	0.18009	0.0119

Bright

Model	BrightBiologyLongRetrieval	BrightEarthScienceLongRetrieval	BrightEconomicsLongRetrieval	BrightPsychologyLongRetrieval	BrightRoboticsLongRetrieval	BrightStackoverflowLongRetrieval	BrightSustainableLivingLongRetrieval	BrightPonyLongRetrieval
BAAI/bge-large-en-v1.5	16.4	27.7	20.9	11.6	10.9	13.3	16.9	0.4
sentence-transformers/all-mpnet-base-v2	25.6	34.1	18.9	15.8	10.9	15.0	18.0	1.2

KennethEnevoldsen

I believe we are good to merge here! Great job everyone.

# Conflicts: # mteb/models/model_implementations/bm25.py

@Samoed

* fix: Simplify conflicts (#3875) * simplify conflicts * add lock * remove torch * 2.6.6 Automatically generated by python-semantic-release * model: add missing sentence transformers and jina models (#3808) * add sentence transformers models * add jina v2 * fix modalities * Don't sync make lint (#3841) * don't sync make lint * don't sync make typecheck * upd ci * upd ci * upd ci * upd ci * upd ci * swap * fix: nv embed version (#3715) * fix nv embed wrapper * try to fix * fix sbert version * 2.6.7 Automatically generated by python-semantic-release * add dataset: KoViDoRe(v2) (#3876) * add dataset: KoViDoRe v2 * fix citation format * add direct loading * lint format * delete benchmark language view Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add typehint for encode kwargs (#3831) * add typehint for encode kwargs * remove num_proc * remove all num proc * fix import * fix docstrings * model: mixedbread-ai/mxbai-rerank-large-v1 (#3905) * Add model: mixedbread-ai/mxbai-rerank-large-v1 * apply suggestions * Added xsmall and base version of reranker models * lintter * add model: bflhc/Octen-Embedding-0.6B (#3906) * fix: KoVidore2EnergyRetrieval revision fix (#3913) * 2.6.8 Automatically generated by python-semantic-release * Artifacts for llama-embed-nemotron-8b model (#3919) add artifacts for llama-embed-nemotron-8b model * fix: model load test (#3914) * fix model load test * trigger on dependencies change * 2.6.9 Automatically generated by python-semantic-release * model: Adding voyage-4-large, voyage-4 and voyage-4-lite (#3885) * Adding voyage-4-large and voyage-4-lite * Adding voyage-4-large and voyage-4-lite * Adding voyage-4 * Reverting voyage-4 (as the tokenizer is not yet available publicly) * added superseeded_by --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Update the nemo retriever reversions to avoid error when loading the model (#3925) * Update the nemo retriever versions to fix the crash issue with visual_config * Update mteb/models/model_implementations/nvidia_llama_nemoretriever_colemb.py * Update mteb/models/model_implementations/nvidia_llama_nemoretriever_colemb.py --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * docs: Resolve problems with missing documentation links (#3834) * resolve problems with missing documentation links * split into files * feat: Add vLLM support (#3794) * init * init Signed-off-by: wang.yuqi <noooop@126.com> * ruff Signed-off-by: wang.yuqi <noooop@126.com> * - vllm_loader Signed-off-by: wang.yuqi <noooop@126.com> * + TYPE_CHECKING Signed-off-by: wang.yuqi <noooop@126.com> * Make vLLM exit properly. Signed-off-by: wang.yuqi <noooop@126.com> * rename Signed-off-by: wang.yuqi <noooop@126.com> * support rerank Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * refine Signed-off-by: wang.yuqi <noooop@126.com> * refine Signed-off-by: wang.yuqi <noooop@126.com> * Update mteb/models/vllm_wrapper.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * refine Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * + docs Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * + benchmark Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * + more benchmark Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * refine docs Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * refine docs Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * fix typing * move type ignore * doc upd * add test * Update Makefile * add support for prompts * add support for prompts * - demo Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * make mypy happy Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * fix typehints * update pyproject * update pyproject * update pyproject * The pooling + dp fails to run. * fix uv lock * fix docs * simplify conflicts * upd lock * upd lock * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update docs/advanced_usage/vllm_wrapper.md Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Apply suggestion from @Samoed Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update Signed-off-by: wang.yuqi <noooop@126.com> * update Signed-off-by: wang.yuqi <noooop@126.com> --------- Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 2.7.0 Automatically generated by python-semantic-release * dataset: add ChemRxivRetrieval task to ChemTEB benchmark (#3923) * dataset: add ChemRxivRetrieval task to ChemTEB benchmark * fix: add descriptive statistics * feat: add ChemTEB v1.1 with ChemRxivRetrieval task * fix: chemteb v1.1 alias * dataset: Add EuroPIRQRetrieval dataset (#3924) * dataset: Add EuroPIRQRetrieval dataset * Removed unnecessary load dataset functions * model: add nemotron rerank (#3750) * add nemotron rerank * move to nvidia models * removed extra params * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * remove or * add docstring * Update mteb/models/model_implementations/nvidia_models.py Co-authored-by: Yauhen Babakhin <ybabakhin@nvidia.com> * update --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Yauhen Babakhin <ybabakhin@nvidia.com> * Update references and citations for ViDoRe V3 benchmark (#3930) * fix: Update references and citations for ViDoRe V3 benchmark * foramat citation * format again --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: Adding voyage-4 model (#3927) * Adding voyage-4 model * Adding voyage-4 model configs * fix: temporarily remove private column from RTEB Link is still missing the note as I am waiting for @isaac-chung and @Samoed to confirm the write-up. fixes #3902 * added issue link * fix remove mean (Task) * lint * fix: Minor logging fixes by activate `LOG` rule (#3820) activate logger rule * 2.7.1 Automatically generated by python-semantic-release * docs: fix vllm broken link (#3936) fix vllm link * model: mixedbread-ai/mxbai-edge-colbert-v0-32m and mixedbread-ai/mxbai-edge-colbert-v0-17m (#3931) * Add model: mixedbread-ai/mxbai-edge-colbert-v0-32m and mixedbread-ai/mxbai-edge-colbert-v0-17m * Lintter * Add quotes * Update dataset name * Apply suggestions from code review * Update mixedbread_ai_models.py --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: add pixie_models (#3938) * model: add pixie_models * Apply lint formatting * fix: computation of results with missing scores (#3874) * fix computation of results with missing scores * fix test * change 0 to nan * change 0 to nan * remove `fill_missing_scores` * fix: expose `ResultCache` directly as `mteb.ResultCache` (#3912) * fix: expose `ResultCache` directly as `mteb.ResultCache` fixes #3910 * docs: Update docs usage of `ResultCache` * merge in fixes to remove_private (#3940) fix: exclude private tasks from Borda rank calculation in RTEB Co-authored-by: bflhc <kunka.xgw@gmail.com> * 2.7.2 Automatically generated by python-semantic-release * fix typo (#3954) * fix colSmol-256M revision (#3956) * dedup colnomic_7b and fix loader (#3957) * dedup colnomic_7b and fix loader * remove flash_attention_2 * refactor: Activate `TC` (#3800) * activate tc * activate `TC` * small import fix * fix imports * fix imports * fix pil import * fix benchmark result validation * full benchmark fix * update * fix unpack imports * upd vllm type * fix: correct inverted unload_data condition in evaluate (#3929) Add tests verifying preloaded data is preserved. Co-authored-by: Daniel Svonava <daniel@superlinked.com> * fix: temporarily remove private column from RTEB (#3932) * fix: temporarily remove private column from RTEB Link is still missing the note as I am waiting for @isaac-chung and @Samoed to confirm the write-up. fixes #3902 * added issue link * fix remove mean (Task) * lint * merge in fixes to remove_private (#3940) fix: exclude private tasks from Borda rank calculation in RTEB Co-authored-by: bflhc <kunka.xgw@gmail.com> --------- Co-authored-by: bflhc <kunka.xgw@gmail.com> * 2.7.3 Automatically generated by python-semantic-release * refactor: split `BRIGHT` benchmark into individual subset tasks (#3285) * refactor: split BRIGHT benchmark into individual subset tasks * readd bright * readd bright subset tasks * feat: add descriptive stats for BRIGHT subsets retrieval tasks * feat: add top_ranked for excluded_ids handling * change main score to recall@1 for long version * improve BRIGHT task descriptions * add prompts to BRIGHT retrieval tasks * refactor: BRIGHT(v1.1) * calculate descriptive stats for BRIGHTLongRetrieval * update prompts * normalize names in prompts * don't filter tasks * remove filter_queries_without_positives and update revision * don't create top ranked if not necessary * get back naucs * fix instructions * add warning * fix import --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Update metadata to include active number of parameter to `ModelMeta` (#3837) * Add active parameter column on LB * update ModelMeta with parameters * update ModelMeta of models * Delete parameter_update_results.csv * fix test * fix tests * delete script * rename for consistency * convert active_parameter to property * rename and fix property * update embedding parameters for model2vec models * remove duplicate loading of models * fix * lintter * fix * remove separate method for embedding parameter calculation * fix embedding calculation to pass typecheck * lintter * fix checking * rename active parameters * upd docstring * fix tests * remove n_active_parameters_override from ModelMeta of all models * lintter * rename file instead of merging main * fix tests * correct tests * Delete model total and active parameters - model_parameters.csv --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 2.7.4 Automatically generated by python-semantic-release * fix: use `num_proc` for dataset processing (#3832) * add typehint for encode kwargs * remove num_proc * start adding num_proc * remove all num proc * fix import * add num proc to transform * add to push to hub * use num proc in vidore v2 * move num proc to evaluate * pass num proc everywhere * fix tests * fix pylate * fix image text pair * fix num workers * add kwargs to `load_data` * 2.7.5 Automatically generated by python-semantic-release * fix: saving aggregated tasks (#3915) fix saving * 2.7.6 Automatically generated by python-semantic-release * model: Adding voyage-4-large (2048d) model configs (#3970) * Adding voyage-4-large (2048d) model configs * Adding voyage-4-large 2048d model configs * Adding voyage-4-large 2048d model configs * fix: Ensure that retrieval tasks only evaluate on specified subsets instead of all (#3946) * fix dataset loading * update logging * add test * fix: Add `fill_missing` parameter in `get_model_meta` (#3801) * Add compute missing parameter in get_model_meta * fix logs * fix * fix from comments * apply suggestion * fix method * add test and fix logic * address comments * rename compute_missing to fill_missing --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: leaderboard Nan handling (#3965) * fix leaderboard * fix loading aggregated tasks * Update mteb/results/task_result.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 2.7.7 Automatically generated by python-semantic-release * fix: Filled active_parameter_overiride for GritLM/GritLM-8x7B nomic-ai/nomic-embed-text-v2-moe (#3967) * Filled active_parameter_overiride for ritLM/GritLM-8x7B and nomic-ai/nomic-embed-text-v2-moe * add correct parameters for nomic-ai/nomic-embed-text-v2-moe * 2.7.8 Automatically generated by python-semantic-release * fix: add kwargs to pub chem load data (#3990) add kwargs to pub chem load data * 2.7.9 Automatically generated by python-semantic-release * fix: `BAAI/bge-small-en` model revision (#3993) fix(models): update invalid bge-small-en revision * fix: NomicWrapper `get_prompt_name` call (#3995) fix(models): correct get_prompt_name call in NomicWrapper * 2.7.10 Automatically generated by python-semantic-release * fix: `BedrockModel` initialization arguments (#3999) fix: add model_name arg to BedrockModel init to prevent multiple values for model_id * 2.7.11 Automatically generated by python-semantic-release * fix: `dataset_transform` signature in PubChemWikiPairClassification (#4001) fix: add num_proc arg to PubChemWikiPairClassification dataset_transform * fix: all dataset transform (#4002) fix dataset transform * 2.7.12 Automatically generated by python-semantic-release * model: Adding Ops-Colqwen3 models (#3987) * Create ops_colqwen3_models.py * Refactor OpsColQwen3 model and processor classes * Update model revision in ops_colqwen3_models.py * Remove calculate_probs method and fix model name Removed the calculate_probs method and updated model name. * format * fix ds name --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: added nomic-ai/nomic-embed-code (#4006) * Add model metadata for nomic-embed-code Added new model metadata for 'nomic-embed-code' * fix nomic_embed_code * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Adding nvidia/nemotron-colembed models (#3941) * Adding nvidia/nemotron-colembed models * add colembed 4b, 8b model meta * fix colembed-3b-v2 model name * update revision for colembed 3b * update revisions * Update mteb/models/model_implementations/nvidia_llama_nemoretriever_colemb.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: added Querit/Querit (#3996) * querit_models_add * Querit_Models_Change * Update * format revise * add future * format revise * format revise * last format revison * last last revise * last last last revison * revise * revise * change the instruction * last revison * revise * revise * revise --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Build image on leaderboard refresh (#4015) build image on leaderboard refresh * fix: simplify dependencies (#4017) * 2.7.13 Automatically generated by python-semantic-release * fix: Make `mteb.get_model` compatible with `CrossEncoders` (#3988) * Made mteg.get_model compatible with CrossEncoders and SparseEncoders * update loader for sparseEncoder * fix import * Simplify structure * Add model_type to sparseEncoder models * remove detection logic of sparsencoder * Add tests and documentation * simplified tests * updated docs * fix docs * fix * fix grammar * Update docs/usage/defining_the_model.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docs/advanced_usage/two_stage_reranking.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docs/index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * address comments --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * rename bm25s to baseline/bm25s (#4007) * rename bm25s to baseline/bm25s * Update mteb/models/get_model_meta.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove logger message * rename Human to baseline/Human --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix support for datsets 4.5 with pandas 3 (#3983) * fix test * fix: sanitize type for label during array conversion * lint * revert typo fix --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * lint * fix typing * fix test import --------- Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: semantic-release <semantic-release> Co-authored-by: Yongbin Choi <whybe.choi@gmail.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: bflhc <kunka.xgw@gmail.com> Co-authored-by: Yauhen Babakhin <ybabakhin@nvidia.com> Co-authored-by: fzoll <5575946+fzoll@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Sahel Sharifymoghaddam <sahel.sharifi@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: HSILA <ali.shiraee@partners.basf.com> Co-authored-by: Elias H <40372306+eherra@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: antoineedy <antoine.edy@illuin.tech> Co-authored-by: Bong-Min Kim <klbm126@gmail.com> Co-authored-by: svonava <svonava@gmail.com> Co-authored-by: Daniel Svonava <daniel@superlinked.com> Co-authored-by: HSILA <a.shiraee@gmail.com> Co-authored-by: caoyi <caoyi0905@mail.hfut.edu.cn> Co-authored-by: Lukas Kleybolte <32893711+Mozartuss@users.noreply.github.com> Co-authored-by: rnyak <16246900+rnyak@users.noreply.github.com> Co-authored-by: youngbeauty250 <140679097+youngbeauty250@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

whybe-choi changed the title ~~refactor: split BRIGHT benchmark into individual subset tasks~~ refactor: split BRIGHT benchmark into individual subset tasks Oct 7, 2025

Samoed reviewed Oct 7, 2025

View reviewed changes

mteb/benchmarks/benchmarks/benchmarks.py Show resolved Hide resolved

mteb/tasks/Retrieval/eng/BrightRetrieval.py Outdated Show resolved Hide resolved

Samoed requested a review from Muennighoff October 7, 2025 13:40

whybe-choi force-pushed the bright-subset-tasks branch from 4240bdb to 826990a Compare October 7, 2025 14:36

This comment was marked as resolved.

Sign in to view

KennethEnevoldsen requested changes Oct 7, 2025

View reviewed changes

Samoed added the new dataset Issues related to adding a new task or dataset label Oct 7, 2025

whybe-choi force-pushed the bright-subset-tasks branch from 826990a to 3ed620f Compare October 8, 2025 11:33

refactor: split BRIGHT benchmark into individual subset tasks

57c757f

whybe-choi force-pushed the bright-subset-tasks branch from 3ed620f to 57c757f Compare October 8, 2025 11:53

Samoed and others added 4 commits October 20, 2025 21:56

Merge branch 'main' into bright-subset-tasks

b04b46e

# Conflicts: # mteb/benchmarks/benchmarks/__init__.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/retrieval/eng/BrightSubsetsLongRetrieval.py # mteb/tasks/retrieval/eng/BrightSubsetsRetrieval.py

readd bright

7299e59

Merge branch 'embeddings-benchmark:main' into bright-subset-tasks

bf31a79

readd bright subset tasks

3f875a2

Samoed reviewed Oct 22, 2025

View reviewed changes

Samoed requested a review from KennethEnevoldsen December 24, 2025 07:33

github-actions bot removed the stale label Dec 25, 2025

Samoed added 2 commits December 28, 2025 04:19

update prompts

825d51c

Merge branch 'main' into bright-subset-tasks

3aafbc1

whybe-choi commented Dec 29, 2025

View reviewed changes

mteb/models/model_implementations/bge_models.py Show resolved Hide resolved

normalize names in prompts

035f3bd

Samoed linked an issue Jan 6, 2026 that may be closed by this pull request

Excluded IDs missing from BRIGHT dataset #2696

Closed

Samoed added 5 commits January 14, 2026 19:14

don't filter tasks

41c8ee4

Merge branch 'main' into bright-subset-tasks

67e88d7

remove filter_queries_without_positives and update revision

1fb1d57

don't create top ranked if not necessary

74faf1c

get back naucs

30158eb

KennethEnevoldsen reviewed Jan 14, 2026

View reviewed changes

Samoed added 3 commits January 18, 2026 11:52

fix instructions

5dc9469

add warning

bf0e37a

fix import

26b189b

KennethEnevoldsen approved these changes Jan 19, 2026

View reviewed changes

Merge branch 'main' into bright-subset-tasks

75c9017

# Conflicts: # mteb/models/model_implementations/bm25.py

Samoed force-pushed the bright-subset-tasks branch from d2bcc4a to 75c9017 Compare January 19, 2026 10:52

Samoed enabled auto-merge (squash) January 19, 2026 10:53

Samoed merged commit 2c9b9e9 into embeddings-benchmark:main Jan 19, 2026
12 checks passed

whybe-choi deleted the bright-subset-tasks branch January 19, 2026 12:54

	class RetrievalSplitData(TypedDict):
	"""A dictionary containing the corpus, queries, relevant documents, instructions, and top-ranked documents for a retrieval task.

	Attributes:
	corpus: The corpus dataset containing documents. Should have columns `id`, `title`, `text` or `image`.
	queries: The queries dataset containing queries. Should have columns `id`, `text`, `instruction` (for instruction retrieval/reranking) or `image`.
	relevant_docs: A mapping of query IDs to relevant document IDs and their relevance scores. Should have columns `query-id`, `corpus-id`, `score`.
	top_ranked: A mapping of query IDs to a list of top-ranked document IDs. Should have columns `query-id`, `corpus-ids` (list[str]). This is optional and used for reranking tasks.
	"""

	corpus: CorpusDatasetType
	queries: QueryDatasetType
	relevant_docs: RelevantDocumentsType
	top_ranked: TopRankedDocumentsType \| None

Conversation

whybe-choi commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark additions

Descriptive statistics

Minor improvement

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Samoed commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Oct 7, 2025

Uh oh!

Muennighoff commented Oct 7, 2025

Uh oh!

whybe-choi commented Oct 8, 2025

Uh oh!

Samoed commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whybe-choi commented Oct 9, 2025

Uh oh!

Samoed commented Oct 9, 2025

Uh oh!

whybe-choi commented Oct 9, 2025

Uh oh!

Samoed commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Muennighoff commented Oct 9, 2025

Uh oh!

whybe-choi commented Oct 10, 2025

Uh oh!

whybe-choi commented Oct 22, 2025

Uh oh!

Samoed commented Oct 22, 2025

Uh oh!

whybe-choi commented Oct 22, 2025

Uh oh!

Samoed commented Oct 22, 2025

Uh oh!

Samoed Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Jan 13, 2026

Uh oh!

Samoed commented Jan 13, 2026

Uh oh!

KennethEnevoldsen commented Jan 13, 2026

Uh oh!

Samoed commented Jan 14, 2026

Uh oh!

KennethEnevoldsen Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Bright

MTEB

Bright

Long Bright

whybe-choi commented Oct 7, 2025 •

edited

Loading

Samoed commented Oct 7, 2025 •

edited

Loading

Samoed commented Oct 8, 2025 •

edited

Loading

Samoed commented Oct 9, 2025 •

edited

Loading

Samoed commented Dec 28, 2025 •

edited

Loading

Samoed commented Dec 28, 2025 •

edited

Loading

Samoed commented Jan 18, 2026 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading