Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request migrates the STS tasks from the VisualSTS-specific abstraction to a more generic AnySTS abstraction. Key changes include:
- Replacing inheritance from AbsTaskVisualSTS to AbsTaskAnySTS across task files.
- Replacing the VisualSTSEvaluator with the new AnySTSEvaluator in both evaluators and evaluation initialization.
- Updating the dataloader creation functions to support both image and text modalities.
Reviewed Changes
Copilot reviewed 58 out of 58 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| mteb/tasks/Image/VisualSTS/multilingual/STSBenchmarkMultilingualVisualSTS.py | Changed base class from AbsTaskVisualSTS to AbsTaskAnySTS. |
| mteb/tasks/Image/VisualSTS/multilingual/STS17MultilingualVisualSTS.py | Changed base class from AbsTaskVisualSTS to AbsTaskAnySTS. |
| mteb/tasks/Image/VisualSTS/en/*.py | Updated all task files to use AbsTaskAnySTS and reordered all lists. |
| mteb/evaluation/evaluators/init.py & mteb/evaluation/evaluators/Image/*.py | Removed VisualSTSEvaluator and updated to use AnySTSEvaluator. |
| mteb/evaluation/evaluators/AnySTSEvaluator.py | Renamed and updated evaluator implementation to work with new dataloader API. |
| mteb/create_dataloaders.py | Introduced create_dataloader function for unified image/text dataloader creation. |
| mteb/abstasks/*.py | Replaced references to VisualSTS with the generic AnySTS approach. |
Comments suppressed due to low confidence (1)
mteb/evaluation/evaluators/AnySTSEvaluator.py:59
- The term 'manhatten' appears to be misspelled; consider renaming it to 'manhattan' for consistency.
manhatten_pearson, _ = pearsonr(self.gold_scores, manhattan_distances)
isaac-chung
left a comment
There was a problem hiding this comment.
Very nice! I don't have much to add. Might want to spot check 1 text and 1 visual STS task each to confirm that scores didn't change.
|
Results
I think the difference is caused by Results
|
# Conflicts: # mteb/abstasks/__init__.py # mteb/tasks/__init__.py # tests/test_benchmark/mock_tasks.py
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)