Skip to content

Comments

[v2] Merge main#2617

Merged
Samoed merged 196 commits intov2.0.0from
merge_main
May 3, 2025
Merged

[v2] Merge main#2617
Samoed merged 196 commits intov2.0.0from
merge_main

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented May 2, 2025

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

Samoed and others added 30 commits April 5, 2025 12:34
* refactor eval langs test

* function returns None

* add hard negaties tasks in _HISTORIC_DATASETS
* rename folder

* trailing spaces

* missed one
Automatically generated by python-semantic-release
* fix gradio leaderboard run

* update docs
* fix hatefulmeme

* add to description and use polars instead

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* conan_models

* conan_models

* refactor code

* refactor code

---------

Co-authored-by: shyuli <shyuli@tencent.com>
…lude aggregate tasks (#2536)

* Implement task.is_aggregate check

* Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed

* Update mteb.run with the new `task.is_aggregate` parameter

* Add tests

* Ran linter

* Changed logic to `exclude_aggregate`

* Updated from review comments

* Exclude aggregate by default false in get_tasks
Automatically generated by python-semantic-release
Add MIEB citation in benchmarks
* [ADD] 2 new Datasets

* [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO

* [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO
* feat: CacheWrapper per task

* refactor logic

* update documentation

---------

Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
Automatically generated by python-semantic-release
move mmteb scripts and notebooks to separate repo
fix: Update package requirements in JinaWrapper for einops and flash_attn
Automatically generated by python-semantic-release
* defined model metadata for xlm_roberta_ua_distilled

* Update mteb/models/ua_sentence_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* included ua_sentence_models.py in overview.py

* applied linting, added missing fields in ModelMeta

* applied linting

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: me5 trainind data config to include xquad dataset

* Update mteb/models/e5_models.py

upddate: xquad key name

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: ME5_TRAINING_DATA format

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: Added dataframe utilities to BenchmarkResults

- Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT?
- Added a tests for ModelResults and BenchmarksResults
- Added a few utility functions where needed
- Added docstring throughout ModelResults and BenchmarksResults
- Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then.

Prerequisite for #2454:

@ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right.

* refactor to to_dataframe and combine common dependencies

* ibid

* fix revision joining after discussion with @x-tabdeveloping

* remove strict=True for zip() as it is a >3.9 feature

* updated mock cache
github-actions bot and others added 9 commits May 2, 2025 04:46
# Conflicts:
#	docs/tasks.md
#	mteb/abstasks/AbsTaskSpeedTask.py
#	mteb/abstasks/TaskMetadata.py
#	mteb/encoder_interface.py
#	mteb/models/misc_models.py
#	mteb/models/ops_moa_models.py
#	mteb/models/ru_sentence_models.py
#	mteb/models/sentence_transformers_models.py
#	mteb/tasks/Classification/__init__.py
#	mteb/tasks/Clustering/deu/BlurbsClusteringP2P.py
#	mteb/tasks/Clustering/deu/BlurbsClusteringS2S.py
#	mteb/tasks/Clustering/deu/TenKGnadClusteringS2S.py
#	mteb/tasks/Clustering/fra/AlloProfClusteringP2P.py
#	mteb/tasks/Clustering/fra/AlloProfClusteringS2S.py
#	mteb/tasks/Clustering/fra/HALClusteringS2S.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py
#	mteb/tasks/Image/Clustering/eng/__init__.py
#	mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py
#	mteb/tasks/Image/ImageClassification/eng/CIFAR.py
#	mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/DTDClassification.py
#	mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py
#	mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Food101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py
#	mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py
#	mteb/tasks/Image/ImageClassification/eng/STL10Classification.py
#	mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py
#	mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py
#	mteb/tasks/Image/ImageClustering/eng/CIFAR.py
#	mteb/tasks/Image/ImageClustering/eng/ImageNet.py
#	mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py
#	mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py
#	mteb/tasks/Image/VisualSTS/__init__.py
#	mteb/tasks/Image/VisualSTS/en/__init__.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/DTD.py
#	mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Food101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py
#	mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py
#	mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py
#	mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py
#	mteb/tasks/Image/ZeroShotClassification/eng/STL10.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py
#	mteb/tasks/Image/__init__.py
#	mteb/tasks/MultiLabelClassification/__init__.py
#	mteb/tasks/Reranking/zho/CMTEBReranking.py
#	mteb/tasks/Retrieval/__init__.py
#	mteb/tasks/__init__.py
#	pyproject.toml
#	tests/test_TaskMetadata.py
@Samoed Samoed added the v2 label May 2, 2025
@Samoed Samoed requested a review from isaac-chung May 2, 2025 06:34
Samoed and others added 17 commits May 2, 2025 09:37
* Update gradio version

Closes #2557

* bump gradio
We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.
* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* add scandisent dataset

* add to init

* typo
Automatically generated by python-semantic-release
* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script
# Conflicts:
#	mteb/benchmarks/benchmarks.py
#	mteb/tasks/Classification/__init__.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/CVBench.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/TUBerlinT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/WITT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/XFlickr30kCoT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/XM3600T2IRetrieval.py
#	mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py
#	mteb/tasks/Image/ImageClassification/eng/CIFAR.py
#	mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/Country211Classification.py
#	mteb/tasks/Image/ImageClassification/eng/DTDClassification.py
#	mteb/tasks/Image/ImageClassification/eng/EuroSATClassification.py
#	mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py
#	mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Food101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Imagenet1k.py
#	mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordFlowersClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py
#	mteb/tasks/Image/ImageClassification/eng/STL10Classification.py
#	mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py
#	mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py
#	mteb/tasks/Image/ImageClustering/eng/CIFAR.py
#	mteb/tasks/Image/ImageClustering/eng/ImageNet.py
#	mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py
#	mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py
#	mteb/tasks/Image/ImageTextPairClassification/AROCocoOrder.py
#	mteb/tasks/Image/ImageTextPairClassification/AROFlickrOrder.py
#	mteb/tasks/Image/ImageTextPairClassification/AROVisualAttribution.py
#	mteb/tasks/Image/ImageTextPairClassification/AROVisualRelation.py
#	mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py
#	mteb/tasks/Image/ImageTextPairClassification/SugarCrepe.py
#	mteb/tasks/Image/ImageTextPairClassification/Winoground.py
#	mteb/tasks/Image/VisualSTS/eng/STS12VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS13VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS14VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS15VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS16VisualSTS.py
#	mteb/tasks/Image/VisualSTS/multilingual/STS17MultilingualVisualSTS.py
#	mteb/tasks/Image/VisualSTS/multilingual/STSBenchmarkMultilingualVisualSTS.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CLEVR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Country211.py
#	mteb/tasks/Image/ZeroShotClassification/eng/DTD.py
#	mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Food101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Imagenet1k.py
#	mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py
#	mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py
#	mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RenderedSST2.py
#	mteb/tasks/Image/ZeroShotClassification/eng/STL10.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SciMMIR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/StanfordCars.py
#	mteb/tasks/Image/ZeroShotClassification/eng/UCF101.py
#	mteb/tasks/PairClassification/fas/FaMTEBPairClassification.py
#	mteb/tasks/PairClassification/multilingual/XNLI.py
#	mteb/tasks/Retrieval/ara/SadeemQuestionRetrieval.py
#	mteb/tasks/Retrieval/multilingual/PublicHealthQARetrieval.py
#	mteb/tasks/Retrieval/pol/FiQAPLRetrieval.py
#	mteb/tasks/Retrieval/zho/CMTEBRetrieval.py
#	pyproject.toml
@Samoed Samoed merged commit 1e56329 into v2.0.0 May 3, 2025
9 checks passed
@Samoed Samoed deleted the merge_main branch May 3, 2025 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.