[v2] Merge main by Samoed · Pull Request #2617 · embeddings-benchmark/mteb

Samoed · 2025-05-02T06:34:42Z

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Update README.md

* refactor eval langs test * function returns None * add hard negaties tasks in _HISTORIC_DATASETS

rename folder

* rename folder * trailing spaces * missed one

Automatically generated by python-semantic-release

* fix gradio leaderboard run * update docs

specify only the multilingual AggTask

* fix hatefulmeme * add to description and use polars instead --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* conan_models * conan_models * refactor code * refactor code --------- Co-authored-by: shyuli <shyuli@tencent.com>

…lude aggregate tasks (#2536) * Implement task.is_aggregate check * Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed * Update mteb.run with the new `task.is_aggregate` parameter * Add tests * Ran linter * Changed logic to `exclude_aggregate` * Updated from review comments * Exclude aggregate by default false in get_tasks

Automatically generated by python-semantic-release

Add MIEB citation in benchmarks

* [ADD] 2 new Datasets * [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO * [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO

* feat: CacheWrapper per task * refactor logic * update documentation --------- Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>

Automatically generated by python-semantic-release

move mmteb scripts and notebooks to separate repo

fix: Update package requirements in JinaWrapper for einops and flash_attn

Automatically generated by python-semantic-release

Add MIEB to README

* defined model metadata for xlm_roberta_ua_distilled * Update mteb/models/ua_sentence_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * included ua_sentence_models.py in overview.py * applied linting, added missing fields in ModelMeta * applied linting --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: me5 trainind data config to include xquad dataset * Update mteb/models/e5_models.py upddate: xquad key name Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: ME5_TRAINING_DATA format --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

@ayush1298

* fix: Added dataframe utilities to BenchmarkResults - Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT? - Added a tests for ModelResults and BenchmarksResults - Added a few utility functions where needed - Added docstring throughout ModelResults and BenchmarksResults - Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then. Prerequisite for #2454: @ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right. * refactor to to_dataframe and combine common dependencies * ibid * fix revision joining after discussion with @x-tabdeveloping * remove strict=True for zip() as it is a >3.9 feature * updated mock cache

# Conflicts: # docs/tasks.md # mteb/abstasks/AbsTaskSpeedTask.py # mteb/abstasks/TaskMetadata.py # mteb/encoder_interface.py # mteb/models/misc_models.py # mteb/models/ops_moa_models.py # mteb/models/ru_sentence_models.py # mteb/models/sentence_transformers_models.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Clustering/deu/BlurbsClusteringP2P.py # mteb/tasks/Clustering/deu/BlurbsClusteringS2S.py # mteb/tasks/Clustering/deu/TenKGnadClusteringS2S.py # mteb/tasks/Clustering/fra/AlloProfClusteringP2P.py # mteb/tasks/Clustering/fra/AlloProfClusteringS2S.py # mteb/tasks/Clustering/fra/HALClusteringS2S.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py # mteb/tasks/Image/Clustering/eng/__init__.py # mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py # mteb/tasks/Image/ImageClassification/eng/CIFAR.py # mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py # mteb/tasks/Image/ImageClassification/eng/DTDClassification.py # mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py # mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py # mteb/tasks/Image/ImageClassification/eng/Food101Classification.py # mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py # mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py # mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py # mteb/tasks/Image/ImageClassification/eng/STL10Classification.py # mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py # mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py # mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py # mteb/tasks/Image/ImageClustering/eng/CIFAR.py # mteb/tasks/Image/ImageClustering/eng/ImageNet.py # mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py # mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py # mteb/tasks/Image/VisualSTS/__init__.py # mteb/tasks/Image/VisualSTS/en/__init__.py # mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py # mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py # mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py # mteb/tasks/Image/ZeroShotClassification/eng/DTD.py # mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py # mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py # mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py # mteb/tasks/Image/ZeroShotClassification/eng/Food101.py # mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py # mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py # mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py # mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py # mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py # mteb/tasks/Image/ZeroShotClassification/eng/STL10.py # mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py # mteb/tasks/Image/__init__.py # mteb/tasks/MultiLabelClassification/__init__.py # mteb/tasks/Reranking/zho/CMTEBReranking.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/__init__.py # pyproject.toml # tests/test_TaskMetadata.py

* Update gradio version Closes #2557 * bump gradio

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* add scandisent dataset * add to init * typo

Automatically generated by python-semantic-release

* Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script

# Conflicts: # mteb/benchmarks/benchmarks.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/CVBench.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/TUBerlinT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/WITT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/XFlickr30kCoT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/XM3600T2IRetrieval.py # mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py # mteb/tasks/Image/ImageClassification/eng/CIFAR.py # mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py # mteb/tasks/Image/ImageClassification/eng/Country211Classification.py # mteb/tasks/Image/ImageClassification/eng/DTDClassification.py # mteb/tasks/Image/ImageClassification/eng/EuroSATClassification.py # mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py # mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py # mteb/tasks/Image/ImageClassification/eng/Food101Classification.py # mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py # mteb/tasks/Image/ImageClassification/eng/Imagenet1k.py # mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordFlowersClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py # mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py # mteb/tasks/Image/ImageClassification/eng/STL10Classification.py # mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py # mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py # mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py # mteb/tasks/Image/ImageClustering/eng/CIFAR.py # mteb/tasks/Image/ImageClustering/eng/ImageNet.py # mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py # mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py # mteb/tasks/Image/ImageTextPairClassification/AROCocoOrder.py # mteb/tasks/Image/ImageTextPairClassification/AROFlickrOrder.py # mteb/tasks/Image/ImageTextPairClassification/AROVisualAttribution.py # mteb/tasks/Image/ImageTextPairClassification/AROVisualRelation.py # mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py # mteb/tasks/Image/ImageTextPairClassification/SugarCrepe.py # mteb/tasks/Image/ImageTextPairClassification/Winoground.py # mteb/tasks/Image/VisualSTS/eng/STS12VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS13VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS14VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS15VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS16VisualSTS.py # mteb/tasks/Image/VisualSTS/multilingual/STS17MultilingualVisualSTS.py # mteb/tasks/Image/VisualSTS/multilingual/STSBenchmarkMultilingualVisualSTS.py # mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py # mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py # mteb/tasks/Image/ZeroShotClassification/eng/CLEVR.py # mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py # mteb/tasks/Image/ZeroShotClassification/eng/Country211.py # mteb/tasks/Image/ZeroShotClassification/eng/DTD.py # mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py # mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py # mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py # mteb/tasks/Image/ZeroShotClassification/eng/Food101.py # mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py # mteb/tasks/Image/ZeroShotClassification/eng/Imagenet1k.py # mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py # mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py # mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py # mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py # mteb/tasks/Image/ZeroShotClassification/eng/RenderedSST2.py # mteb/tasks/Image/ZeroShotClassification/eng/STL10.py # mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py # mteb/tasks/Image/ZeroShotClassification/eng/SciMMIR.py # mteb/tasks/Image/ZeroShotClassification/eng/StanfordCars.py # mteb/tasks/Image/ZeroShotClassification/eng/UCF101.py # mteb/tasks/PairClassification/fas/FaMTEBPairClassification.py # mteb/tasks/PairClassification/multilingual/XNLI.py # mteb/tasks/Retrieval/ara/SadeemQuestionRetrieval.py # mteb/tasks/Retrieval/multilingual/PublicHealthQARetrieval.py # mteb/tasks/Retrieval/pol/FiQAPLRetrieval.py # mteb/tasks/Retrieval/zho/CMTEBRetrieval.py # pyproject.toml

Samoed and others added 30 commits April 5, 2025 12:34

SpeedTask add deprecated warning (#2493)

ef59031

Docs: Update README.md (#2494)

315522c

Update README.md

fix transformers version for now (#2504)

deb4766

Fix typos (#2509)

77bef06

ci: refactor TaskMetadata eval langs test (#2501)

cb2825c

* refactor eval langs test * function returns None * add hard negaties tasks in _HISTORIC_DATASETS

rename to ImageClustering folder (#2516)

e7d67c5

rename folder

Clean up trailing spaces citation (#2518)

2e612e4

* rename folder * trailing spaces * missed one

[mieb] Memotion preprocessing code made more robust and readable (#2519)

2356e49

fix: validate lang code in ModelMeta (#2499)

2d15895

Update pyproject.toml (#2522)

efcbbe1

1.36.38

aceb995

Automatically generated by python-semantic-release

Fix leaderboard version (#2524)

d53e585

* fix gradio leaderboard run * update docs

Fix gte-multilingual-base embed_dim (#2526)

d7a70fc

[MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539)

fc6ee95

specify only the multilingual AggTask

[mieb] fix hatefulmemes (#2531)

06da74e

* fix hatefulmeme * add to description and use polars instead --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Model conan (#2534)

7fcb582

* conan_models * conan_models * refactor code * refactor code --------- Co-authored-by: shyuli <shyuli@tencent.com>

1.36.39

81bccef

Automatically generated by python-semantic-release

docs: Add MIEB citation in benchmarks (#2544)

99c22b5

Add MIEB citation in benchmarks

Add 2 new Vietnamese Retrieval Datasets (#2393)

f2f37f8

* [ADD] 2 new Datasets * [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO * [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO

Update tasks table

b7e447a

fix: CacheWrapper per task (#2467)

67881c4

* feat: CacheWrapper per task * refactor logic * update documentation --------- Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>

1.36.40

e6b1949

Automatically generated by python-semantic-release

misc: move MMTEB scripts and notebooks to separate repo (#2546)

58769c4

move mmteb scripts and notebooks to separate repo

fix: Update requirements in JinaWrapper (#2548)

caa6e70

fix: Update package requirements in JinaWrapper for einops and flash_attn

1.36.41

cb86939

Automatically generated by python-semantic-release

Docs: Add MIEB to README (#2550)

75d3597

Add MIEB to README

github-actions bot and others added 9 commits May 2, 2025 04:46

Update tasks & benchmarks tables

73afd47

Update tasks & benchmarks tables

94e7585

Update tasks & benchmarks tables

b2bfa6b

Update tasks & benchmarks tables

69937da

Update tasks & benchmarks tables

20baefb

CI: fix table (#2615)

4d09a1a

Update tasks & benchmarks tables

603aa5b

fixes

84e94a5

Samoed added the v2 label May 2, 2025

Samoed requested a review from isaac-chung May 2, 2025 06:34

Samoed and others added 17 commits May 2, 2025 09:37

Merge branch 'main' into merge_main

a16b910

Update gradio version (#2558)

eabd9a5

* Update gradio version Closes #2557 * bump gradio

fix: Removed missing dataset for MTEB(Multilingual) and bumped version

f063638

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

fix retrieval loader

b10e720

add descriptive stats

dbbc185

Add ScandiSent dataset (#2620)

cb57999

* add scandisent dataset * add to init * typo

Merge branch 'main' of https://github.com/embeddings-benchmark/mteb

485941b

lint

2ecd7ad

1.38.4

9cfa2e8

Automatically generated by python-semantic-release

Format all citations (#2614)

e0c2dc9

* Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script

fix citations

30d75d0

update imports

0f73554

fix citations

5fbc6a7

fix citations

33f1de4

format citation

a2b9f6b

Samoed merged commit 1e56329 into v2.0.0 May 3, 2025
9 checks passed

Samoed deleted the merge_main branch May 3, 2025 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[v2] Merge main#2617

[v2] Merge main#2617
Samoed merged 196 commits intov2.0.0from
merge_main

Samoed commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Comments

Conversation

Samoed commented May 2, 2025

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants