[v2] Merge main by Samoed · Pull Request #1927 · embeddings-benchmark/mteb

Samoed · 2025-02-02T14:26:55Z

I added CodeRAGStackoverflowPosts to exceptions for test with descriptive_stat for because it takes more than 128GB of memory to calculate #1595 (comment)

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

update stella meta

Automatically generated by python-semantic-release

* Add Summary Retrieval Task * Add FaMTEBClassification * Add FaMTEBClustering * Add FaMTEBPairClassification * Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS * Add FaMTEBSummaryRetrieval * Add FaMTEB to benchmarks * fix benchmark names * temporary fix metadata * Fix dataset revisions * Update SummaryRetrievalEvaluator.py * Update task files * Update task files * add data domain and subtask description * Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval * Update AbsTaskSummaryRetrieval * Add mock task * Update AbsTaskSummaryRetrieval * Update AbsTaskSummaryRetrieval * make lint * Refactor SummaryRetrieval to subclass BitextMining * Add aggregated datasets --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: e.zeinivand <zeinivand@ymail.com> Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com>

* update docs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * update readme * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Adding a banner to the new MMTEB leaderboard * linting * Update mteb/leaderboard/app.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * adding reference to mteb arena --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix: Updated citation for mteb(scandinavian)

* add three out of four datasets in CodeRAG-Bench * add verified CodeRAGStackoverflowPostsRetrieval dataset * clean up code and make some comments * fixed lint errors * addressed comments about code-rag datasets: fixed grammar and remove unnessary code and loop * roll back files which is not supposed to change * fixed the comments in split_by_first_newline() and make the methods private by adding a underscore prefix * refactor to use common args * update task descriptions * add entry in benchmarks * correct the alphanumeric order for the dataset * add in tasks.md * add in tasks.md * update task metadata * update importing path * fix lint errors * correct CodeRAG task metadata description field and id for stackoverflow-posts * fix error in test --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Automatically generated by python-semantic-release

This reverts commit 7e8be03.

# Conflicts: # mteb/abstasks/AbsTask.py # mteb/evaluation/MTEB.py # mteb/load_results/task_results.py # mteb/tasks/Retrieval/eng/NQRetrieval.py

Samoed · 2025-02-04T12:33:52Z

@isaac-chung Can I merge this? Because after that merge I want to merge main with MIEB

isaac-chung · 2025-02-04T12:44:48Z

Whoops, thanks for pinging me. Got buried in notifications. Will take a look at the last few commits.
[edit] At a glance it looks good! I'll not be available for a few days, so no rush on the mieb merge.

Samoed · 2025-02-04T13:06:52Z

I want to merge MIEB into v2 to make it available for refactoring. The most significant breaking change so far has been the removal of MultilingualTask, but that can be fixed quickly (I think).

isaac-chung · 2025-02-04T14:35:16Z

I want to merge MIEB into v2 to make it available for refactoring. The most significant breaking change so far has been the removal of MultilingualTask, but that can be fixed quickly (I think).

While some refactoring can be done (e.g.#1944 (comment)), I'd say the core still needs #1950 and its conclusions.

Samoed · 2025-02-04T14:39:21Z

In some parts yes, but generally I want to merge it to not block development.

github-actions bot and others added 30 commits January 29, 2025 13:28

Update tasks table

a91d268

Update tasks table

d8bf18b

Update tasks table

d005797

Update tasks table

251142e

Update tasks table

da08617

Update tasks table

93f23c4

Update tasks table

a764fd7

Update tasks table

0861254

update stella/jasper metainfo (#1896)

976bdd5

update stella meta

Update tasks table

cc1e899

Update tasks table

1c84c1c

Update tasks table

d6deab1

Update tasks table

ef929f8

Update tasks table

a5d1538

Update tasks table

42c175f

1.31.5

7f9ca64

Automatically generated by python-semantic-release

Update tasks table

e04218c

Update tasks table

0a57880

Update tasks table

d44f9c3

Update tasks table

35b2c09

Update tasks table

0a59704

Update tasks table

7996458

Update tasks table

6cc0560

Update tasks table

f258cfc

Update tasks table

77681bf

Update tasks table

2850a97

Update tasks table

28ad172

KennethEnevoldsen and others added 8 commits February 2, 2025 00:46

docs: Updated citation for mteb(scandinavian) (#1914)

f3526fc

fix: Updated citation for mteb(scandinavian)

Update tasks table

57db0f9

1.31.8

dba7a95

Automatically generated by python-semantic-release

merge

327fc40

update __init__

9b52268

update generate_imports script for aggregational tasks

4f2ce03

add descriptive stats

728757b

Samoed requested review from KennethEnevoldsen and isaac-chung February 2, 2025 14:26

Samoed changed the base branch from main to v2.0.0 February 2, 2025 14:27

remove print from script generate_imports

b21eea1

Samoed added the v2 label Feb 2, 2025

add rest of metadata

845491e

Samoed mentioned this pull request Feb 2, 2025

[v2] Remove multilingual task #1926

Merged

4 tasks

Samoed and others added 6 commits February 2, 2025 19:17

fix tests

7e8be03

add todo for test

913781a

Revert "fix tests"

86d3358

This reverts commit 7e8be03.

add back check for multilingual

8ef483e

Merge branch 'v2.0.0' into merge_main

33628bf

Merge branch 'refs/heads/v2.0.0' into merge_main

592c925

# Conflicts: # mteb/abstasks/AbsTask.py # mteb/evaluation/MTEB.py # mteb/load_results/task_results.py # mteb/tasks/Retrieval/eng/NQRetrieval.py

fix imports

4e733bd

isaac-chung approved these changes Feb 4, 2025

View reviewed changes

Samoed merged commit 3d1f80c into v2.0.0 Feb 4, 2025
11 checks passed

Samoed deleted the merge_main branch February 4, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Merge main#1927

[v2] Merge main#1927
Samoed merged 278 commits intov2.0.0from
merge_main

Samoed commented Feb 2, 2025 •

edited

Loading

Uh oh!

Samoed commented Feb 4, 2025

Uh oh!

isaac-chung commented Feb 4, 2025 •

edited

Loading

Uh oh!

Samoed commented Feb 4, 2025 •

edited

Loading

Uh oh!

isaac-chung commented Feb 4, 2025

Uh oh!

Samoed commented Feb 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Samoed commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

Samoed commented Feb 4, 2025

Uh oh!

isaac-chung commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Feb 4, 2025

Uh oh!

Samoed commented Feb 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Samoed commented Feb 2, 2025 •

edited

Loading

isaac-chung commented Feb 4, 2025 •

edited

Loading

Samoed commented Feb 4, 2025 •

edited

Loading