MTEB Human Tasks by AdnanElAssadi56 · Pull Request #3213 · embeddings-benchmark/mteb

AdnanElAssadi56 · 2025-09-25T15:12:01Z

No description provided.

- define audio encoder interface - implement abstask and evaluator for clustering

…retrieval abstask and evaluator

Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering

…SCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

…mbeddings-benchmark#2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

…b dataset (subset)" (embeddings-benchmark#2202)

… VoxCeleb dataset (subset)"" (embeddings-benchmark#2203) Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…" This reverts commit ee10191.

…odel and VoxCeleb dataset (subset)""" (embeddings-benchmark#2207) Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…" This reverts commit f1449c0.

… FSD50k Dataset and Task (embeddings-benchmark#2082) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch embeddings-benchmark@4f23fdf This reverts commit 4f23fdf, reversing changes made to 1302477. * fix model imports * fqd50k cleaning * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask --------- Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * my 0 shot * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * added zero-shot loading model and dataset checked * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * before cleaning comments * ESC and clap model. Tested 81 percent zero-shot numbers * fixed label names for ESC50-multilabel and removed comments * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * changes for debugging * lint changes and maeb main branch merge * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch embeddings-benchmark@4f23fdf This reverts commit 4f23fdf, reversing changes made to 1302477. * fix model imports * fqd50k cleaning * fixed error in Image zero shot classfification * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask * removed unncessary files * removed unncrssary files * removed uncrssary files part 3 * deleted esc50 from multi label classification * fixed errors * fixed lintng, added precision and recall. Removed extra comments * fixed double loading of model * filled in missing meta-data * fixed linting --------- Co-authored-by: Animesh Jha <jha.animesh01@gmail.com> Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* added unfused model * fixed lint

…on task (embeddings-benchmark#2285) * Added fsd50k dataset on huggingface * added correct hf version of fsd50k dataset * added correct hf version of fsd50k dataset * removed extra imports * removed unecessary load_data fn

* added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint

* MSCLAP Model * typo * type 2 * fixed audio emeddings * audio handling * fix float error * move inputs to gpu * device handling * model to device * device mismatch * device * text input to device * text device mismatch fix * Adding Variants * lint + metadata fix * lint

* wav2clip model * metadata placeholders * tensor-list mismatch * audio-preprocessing * tensor to numpy * gpu oom fix * typo + clean * lint + metadata * model name fix

* Added EmoVDB Retrieval Dataset * Update __init__.py * add Emotional Speech Retrieval --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* initial commit muq_mulan * added revision * Refactor audio processing in MuQMuLanWrapper to handle different audio input types. Updated tensor conversion logic for numpy arrays and lists, ensuring compatibility with existing torch tensor formats. Improved resampling handling for audio inputs. * metadata + lint * metadata update --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix encode()

* Added HiFiTTS Retrieval Dataset * remove dialect * clean up metadata --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Add MusicCaps dataset for audio retrieval tasks * Update MusicCaps.py * add Music Caption Retrieval --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Added CMU_Arctic Retrieval Dataset * Update CMU_Arctic.py

* MSCLAP Batch Implementation * wav2clip batch implementation * msclap fallback * MuQ_Mulan Batch Implementation * logging + fallbacks * remove unnessary log + lint * Update msclap_models.py

* update citation script * Add audioset (WIP) (embeddings-benchmark#2331) Added audioset draft commit Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * update audioset metadata * add audioset mini * use lrap --------- Co-authored-by: Rahul C <chandrahul0320@gmail.com>

* cast batch to numpy and pad max_length * trigger CI

* add the script to process commonvoice data * add script to upload common voice data * add dev data folder which was missing. Supress error from tarfile * Add 'Speech Retrieval' for common voice T2A task * add common voice 17 for temporary review * add import common voice script in init file * add a2t and t2a data transformation * fixed class name, superclass and eval languages * fixed linting errors and a tar file decompression error * ruff reformat * add common voice 21 * ruff reformat * fixed the citation of task metadata * ruff format * fixed language code

* fleurs first commit * ruff format fleurs * fixed bibtex citation

* Truncation + Progress Bar * ran lint * Update muq_mulan_model.py

add text as well for a2t

Revert "MAEB Model Evaluation Fixes (embeddings-benchmark#2956)" This reverts commit 069b294.

…nto mteb-human-tasks

Samoed · 2025-09-25T15:40:37Z

Can you remove MAEB commits?

Samoed · 2025-09-25T16:27:14Z

Cherry-pick commits to #3214. Feel free to update new PR

sufen-f and others added 30 commits February 18, 2025 23:33

Started the following:

32b7af8

- define audio encoder interface - implement abstask and evaluator for clustering

Minor changes and linted files. embeddings-benchmark#2093

8eff2c6

Minor changes and linted files. embeddings-benchmark#2093

53a2e36

Minor changes and linted files. embeddings-benchmark#2093

ed93f2b

Refs embeddings-benchmark#2068: Initial Implementation of audio-text …

fbab033

…retrieval abstask and evaluator

Added MockAudioClustering task + MockAudioEncoder for testcase

d39e187

Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering

MockAudioClustering + MockAudioEncoder (embeddings-benchmark#2093)

bcca37f

Added wav2vec model wrapper

2a238ed

Added subTask with small sample of dataset for testing

7816974

Added four w2v variants

07f53b1

Update wav2vec_models.py

882af38

Added wav2vec (5), wavlm (7), and whisper (5) models

daeada0

Added revisions from HF to wav2vec models, added silhouette score, DB…

c1ebf2a

…SCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender

Update mteb/models/wavlm_models.py

716deed

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

setting up colab

ce1bee9

Merge remote-tracking branch 'origin/maeb' into maeb

4cf7e6f

added a2a

545b938

PCA + hidden layer + shuffling

ed978fa

New task: emotion clustering

1616ba9

Added qwen2 model

ac14d16

Merge branch 'maeb' into maeb

4f23fdf

Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…

ee10191

…b dataset (subset)" (embeddings-benchmark#2202)

Revert "Revert "Maeb - added voice clustering task, wav2vec model and…

f1449c0

… VoxCeleb dataset (subset)"" (embeddings-benchmark#2203) Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…" This reverts commit ee10191.

Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec m…

d731d40

…odel and VoxCeleb dataset (subset)""" (embeddings-benchmark#2207) Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…" This reverts commit f1449c0.

Add unfused clap model for zero-shot (embeddings-benchmark#2269)

6d9eca3

* added unfused model * fixed lint

added large, music and speech clap models (embeddings-benchmark#2284)

bdefb14

* added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint

AdnanElAssadi56 and others added 26 commits July 21, 2025 19:14

MAEB Model Wav2Clip (embeddings-benchmark#2908)

dd6a76a

* wav2clip model * metadata placeholders * tensor-list mismatch * audio-preprocessing * tensor to numpy * gpu oom fix * typo + clean * lint + metadata * model name fix

Audio Retrieval Dataset: EmoVDB (embeddings-benchmark#2923)

7e1fb93

* Added EmoVDB Retrieval Dataset * Update __init__.py * add Emotional Speech Retrieval --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix encode() in audio models (embeddings-benchmark#2926)

7801759

fix encode()

Audio Retrieval Dataset: HiFiTTS (embeddings-benchmark#2924)

7a4be45

* Added HiFiTTS Retrieval Dataset * remove dialect * clean up metadata --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Audio Retrieval Dataset: MusicCaps (embeddings-benchmark#2918)

8a01d4e

* Add MusicCaps dataset for audio retrieval tasks * Update MusicCaps.py * add Music Caption Retrieval --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Audio Retrieval Dataset: CMU-Arctic (embeddings-benchmark#2929)

53071b3

* Added CMU_Arctic Retrieval Dataset * Update CMU_Arctic.py

Audio Models Batch Fix (embeddings-benchmark#2932)

b087dfe

* MSCLAP Batch Implementation * wav2clip batch implementation * msclap fallback * MuQ_Mulan Batch Implementation * logging + fallbacks * remove unnessary log + lint * Update msclap_models.py

[MAEB] Fix whisper model audio inference (embeddings-benchmark#2954)

b875aa2

* cast batch to numpy and pad max_length * trigger CI

fleurs retrieval tasks (embeddings-benchmark#2976)

d841b33

* fleurs first commit * ruff format fleurs * fixed bibtex citation

MAEB Model Evaluation Fixes (embeddings-benchmark#2956)

069b294

* Truncation + Progress Bar * ran lint * Update muq_mulan_model.py

Fix ClothoA2T modality (embeddings-benchmark#2988)

671be23

add text as well for a2t

Revert "MAEB Model Evaluation Fixes" (embeddings-benchmark#2993)

49528b6

Revert "MAEB Model Evaluation Fixes (embeddings-benchmark#2956)" This reverts commit 069b294.

Human Subsets Tasks

5ba74fc

Fixed Multilingual Classification Subset

24ea203

add google embedding variant

926acb2

loader fix

e134015

license add

2438bb2

validator check

5a9667b

typo

5c97cd4

full loader

bd3d7a3

typo

bcc1bb0

Merge branch 'main' of https://github.com/embeddings-benchmark/mteb i…

d0b6b10

…nto mteb-human-tasks

Samoed mentioned this pull request Sep 25, 2025

dataset: add human tasks and benchmark #3214

Merged

Samoed closed this Sep 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MTEB Human Tasks#3213

MTEB Human Tasks#3213
AdnanElAssadi56 wants to merge 143 commits intoembeddings-benchmark:mainfrom
AdnanElAssadi56:mteb-human-tasks

AdnanElAssadi56 commented Sep 25, 2025

Uh oh!

Samoed commented Sep 25, 2025

Uh oh!

Samoed commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Conversation

AdnanElAssadi56 commented Sep 25, 2025

Uh oh!

Samoed commented Sep 25, 2025

Uh oh!

Samoed commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants