[WIP] Add RTEB retrieval code#2529
Conversation
Samoed
left a comment
There was a problem hiding this comment.
Please, don't add directly rteb repository here. You should integrate your datasets into mteb. We standartized loading for retrieval tasks in v2 branch
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Yea so generally agree with @Samoed here, but let us take it one element at a time.
-
datasets: Some of the datasets are already integrated in MTEB and we def. don't want duplicates. I must say that I am unsure why we can't add these as regular MTEB Retrieval tasks? Is there anything that I am missing?
-
Models: All models are already implemented in MTEB. We don't want duplicate models.
These points were also raised in the issue.
* Create hook for RTEB evaluator inside MTEB repository * Create hook for RTEB evaluator inside MTEB repository * Removing unused files and adding some changes * Query/document types * Refactoring (removing ebr, create rteb Retrieval folder) * Refactoring (removing ebr, create rteb Retrieval folder) * Separating the logic * Using pl.LightningModule for the RTEB eval input_type=None for the Voyage model (just like in RTEB) * Storing rteb cached results in a separate folder * Removing the Models (we'll use the MTEB models) Add all the remaining RTEB Datasets - with TODOs * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Create new RTEB task type (AbsTaskRTEB) Refactor all RTEB tasks * Aggregated task * Use HFDataLoader! * Removing the rteb package * Removing the rteb package * Made all datasets working * Correct voyageai model * First simplifications * Simplifications * Simplifications * Simplifications
|
Still WIP but I think this is ready for a first pass. @Samoed @KennethEnevoldsen to your earlier points, we changed the format of all the datasets to match those of existing retrieval datasets; however, the task is still separate since we're tracking multiple metrics and the format of the output results is a bit different. We can merge it into the broader retrieval task down the road. |
There was a problem hiding this comment.
Your implementation is quite different from how other tasks are handled. I checked a few datasets and couldn't find zenodo/4063986 used in RTEBAILACasedocsTask, and lavita/ChatDoctor-HealthCareMagic-100k doesn't load with your current loader.
If you need just to add scores to Retrieval to correctly work then, you can use this #2600 for that (this is v2 branch)
| return round(model_memory_mb) | ||
|
|
||
| @property | ||
| def _id(self) -> str: |
| model_name="voyage-3-large", # Match the API model name | ||
| model_prompts=model_prompts, | ||
| ), | ||
| max_tokens=32000, # Assuming same as voyage-3 |
There was a problem hiding this comment.
| max_tokens=32000, # Assuming same as voyage-3 | |
| max_tokens=32768, |
| model_prompts=model_prompts, | ||
| ), | ||
| max_tokens=32000, # Assuming same as voyage-3 | ||
| embed_dim=1024, # Assuming same as voyage-3 |
| ) | ||
| rteb_encoder._trainer = trainer | ||
|
|
||
| args = argparse.Namespace( |
There was a problem hiding this comment.
Why do you use argparse.Namespace?
| "enable_checkpointing": False, | ||
| "enable_progress_bar": True, | ||
| } | ||
| trainer = pl.Trainer(**trainer_kwargs) |
There was a problem hiding this comment.
Why do you need Trainer if you're only encoding?
| import warnings | ||
|
|
||
| warnings.warn( | ||
| "_evaluate_subset is deprecated for RTEB tasks. Use RTEBTaskRunner.run_rteb_evaluation instead.", |
|
Hi @fzliu. This PR has currently introduced:
I would actually suggest closing this PR and opening two new ones:
This allows us to focus on the main points. On those main points:
What metrics are different, why do we add them
I prefer keeping the current results format. This works with the current leaderboard, review pipeline, utility functions etc. Changing the result format would require some justification. |
Related: #2517
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.