[WIP] Add RTEB retrieval code by fzliu · Pull Request #2529 · embeddings-benchmark/mteb

fzliu · 2025-04-10T08:01:51Z

Related: #2517

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Merging ModelMeta

Samoed

Please, don't add directly rteb repository here. You should integrate your datasets into mteb. We standartized loading for retrieval tasks in v2 branch

KennethEnevoldsen

Yea so generally agree with @Samoed here, but let us take it one element at a time.

datasets: Some of the datasets are already integrated in MTEB and we def. don't want duplicates. I must say that I am unsure why we can't add these as regular MTEB Retrieval tasks? Is there anything that I am missing?
Models: All models are already implemented in MTEB. We don't want duplicate models.

These points were also raised in the issue.

* Create hook for RTEB evaluator inside MTEB repository * Create hook for RTEB evaluator inside MTEB repository * Removing unused files and adding some changes * Query/document types * Refactoring (removing ebr, create rteb Retrieval folder) * Refactoring (removing ebr, create rteb Retrieval folder) * Separating the logic * Using pl.LightningModule for the RTEB eval input_type=None for the Voyage model (just like in RTEB) * Storing rteb cached results in a separate folder * Removing the Models (we'll use the MTEB models) Add all the remaining RTEB Datasets - with TODOs * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Aggregated task * Use HFDataLoader! * Removing the rteb package * Removing the rteb package * Made all datasets working * Correct voyageai model * First simplifications * Simplifications * Simplifications * Simplifications

fzliu · 2025-05-01T01:27:23Z

Still WIP but I think this is ready for a first pass.

@Samoed @KennethEnevoldsen to your earlier points, we changed the format of all the datasets to match those of existing retrieval datasets; however, the task is still separate since we're tracking multiple metrics and the format of the output results is a bit different.

We can merge it into the broader retrieval task down the road.

Samoed

Your implementation is quite different from how other tasks are handled. I checked a few datasets and couldn't find zenodo/4063986 used in RTEBAILACasedocsTask, and lavita/ChatDoctor-HealthCareMagic-100k doesn't load with your current loader.

If you need just to add scores to Retrieval to correctly work then, you can use this #2600 for that (this is v2 branch)

Samoed · 2025-05-01T05:19:09Z

mteb/model_meta.py

        return round(model_memory_mb)

+    @property
+    def _id(self) -> str:


Why do you need this?

Samoed · 2025-05-01T05:20:55Z

mteb/models/voyage_models.py

+        model_name="voyage-3-large",  # Match the API model name
+        model_prompts=model_prompts,
+    ),
+    max_tokens=32000,  # Assuming same as voyage-3


Suggested change

max_tokens=32000, # Assuming same as voyage-3

max_tokens=32768,

Samoed · 2025-05-01T05:21:20Z

mteb/models/voyage_models.py

+        model_prompts=model_prompts,
+    ),
+    max_tokens=32000,  # Assuming same as voyage-3
+    embed_dim=1024,  # Assuming same as voyage-3


Can you check?

Samoed · 2025-05-01T05:24:53Z

mteb/abstasks/AbsTaskRTEB.py

+        )
+        rteb_encoder._trainer = trainer
+
+        args = argparse.Namespace(


Why do you use argparse.Namespace?

Samoed · 2025-05-01T05:26:22Z

mteb/abstasks/AbsTaskRTEB.py

+            "enable_checkpointing": False,
+            "enable_progress_bar": True,
+        }
+        trainer = pl.Trainer(**trainer_kwargs)


Why do you need Trainer if you're only encoding?

Samoed · 2025-05-01T05:27:24Z

mteb/abstasks/AbsTaskRTEB.py

+        import warnings
+
+        warnings.warn(
+            "_evaluate_subset is deprecated for RTEB tasks. Use RTEBTaskRunner.run_rteb_evaluation instead.",


KennethEnevoldsen · 2025-05-02T12:17:40Z

Hi @fzliu. This PR has currently introduced:

Multiple tasks
A new abstask (the core)
A model
an aggregated task (not needed, should be a benchmark)

I would actually suggest closing this PR and opening two new ones:

with the model
with the RTEB abstask and one sample task

This allows us to focus on the main points. On those main points:

the task is still separate since we're tracking multiple metrics

What metrics are different, why do we add them

and the format of the output results is a bit different.

I prefer keeping the current results format. This works with the current leaderboard, review pipeline, utility functions etc.

Changing the result format would require some justification.

fzowl and others added 5 commits April 8, 2025 18:38

Merging RTEB code

ff8be03

Merging RTEB code

e0f343e

Merging ModelMeta

d595263

Merging ModelMeta

82fe6fe

Merge pull request #1 from fzliu/merge_model_meta

005df9b

Merging ModelMeta

Samoed requested changes Apr 10, 2025

View reviewed changes

Samoed changed the title ~~[WIP] Add retrieval code~~ [WIP] Add RTEB retrieval code Apr 10, 2025

KennethEnevoldsen reviewed Apr 10, 2025

View reviewed changes

Samoed marked this pull request as draft April 13, 2025 11:37

Samoed requested changes May 1, 2025

View reviewed changes

fzowl and others added 5 commits May 2, 2025 17:44

Corrections due to the tests

7d00cb0

Adding a new test

6719514

A few file path corrections

3755e54

Correcting the embedding file names (non in-mem case)

f81c439

Merge branch 'embeddings-benchmark:main' into main

ee6581f

fzliu closed this May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add RTEB retrieval code#2529

[WIP] Add RTEB retrieval code#2529
fzliu wants to merge 11 commits intoembeddings-benchmark:mainfrom
fzliu:main

fzliu commented Apr 10, 2025

Uh oh!

Samoed left a comment

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

fzliu commented May 1, 2025

Uh oh!

Samoed left a comment •

edited

Loading

Uh oh!

Samoed May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

KennethEnevoldsen commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	max_tokens=32000, # Assuming same as voyage-3
	max_tokens=32768,

Conversation

fzliu commented Apr 10, 2025

Code Quality

Documentation

Testing

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

fzliu commented May 1, 2025

Uh oh!

Samoed left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Samoed left a comment •

edited

Loading