Skip to content

MTEB Human Tasks#3213

Closed
AdnanElAssadi56 wants to merge 143 commits intoembeddings-benchmark:mainfrom
AdnanElAssadi56:mteb-human-tasks
Closed

MTEB Human Tasks#3213
AdnanElAssadi56 wants to merge 143 commits intoembeddings-benchmark:mainfrom
AdnanElAssadi56:mteb-human-tasks

Conversation

@AdnanElAssadi56
Copy link
Contributor

No description provided.

sufen-f and others added 30 commits February 18, 2025 23:33
- define audio encoder interface
- implement abstask and evaluator for clustering
Created test_maeb_datasets.py to test  AbsTask and Evaluator for clustering
…SCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
…mbeddings-benchmark#2175)

* Added wav2vec model wrapper

* Added four w2v variants

* Update wav2vec_models.py

* Removed run.py test script

* Added subTask with small sample of dataset for testing

* Removed test portion of VoiceGender.py task

* add commit hash and bibtex

* make lint

* update models

* fix circular import

* make VoiceGender discoverable in get_tasks

* add a2a as category for clustering

* specify latest commit hash

* revert linting changes

* Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file

* Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype

* Changed task from Voice Gender Clustering to Gender Clustering.

* Fixed mock audio clustering tests

* Added dataset metadata

* Linted

* Passed revision into the w2v2 loader

* passed lint check

* Linted

* Update VoiceGender.py

---------

Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com>
Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com>
Co-authored-by: mn <mn@Ms-MacBook-Pro.local>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
… VoxCeleb dataset (subset)"" (embeddings-benchmark#2203)

Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…"

This reverts commit ee10191.
…odel and VoxCeleb dataset (subset)""" (embeddings-benchmark#2207)

Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…"

This reverts commit f1449c0.
… FSD50k Dataset and Task (embeddings-benchmark#2082)

* init audio

* some encoder related changes

* some more abs task defs

Co-authored-by: rahulschand <rahulsc@stanford.edu>

* evaluators and classification

* remove rahul changes to generate first PR

* make lint

* add dataset/tasks skeleton

* readd changes lost in rebase

* add fsd50k

* add task categories for audio

* slight updates to fsd50k

* make lint

* wav2vec2 model

* add fsd50k metadata

* rename folder

* add metric

* add torchaudio in req

* reigster wav2vec2 models

* fixes

* add audio in valid task types

* mock interface changes

* make lint

* rm audio clustering

* wav2vec2 model revision update

* rm comment

* rm test.py

* add revisions to all wav2vec2 models

* rm empty abstask files

* rm empty evaluator files

* rm empty task files

* Update tests/test_tasks/test_all_abstasks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update mteb/models/wav2vec2_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* rm non-logReg evaluators for audio classification

* lint

* fn name changed to convert_audio_from_numpy

* rm mock tests for audio kNN classification

* rm evaluators for audio kNN classification

* fix imports

* fix audio kNN; make lint

* rm AbsTaskAudioClassification.py for later PR

* remove commented code; reset changes to ClassificationEvaluator.py

* fix mock tasks for multilabel classification

* make lint

* inherit Wrapper class

* add all languages supported by wav2vec2

* make lint

* add script info to all languages

* make lint

* recent changes

* merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets

* make lint remove uwanted files

* remove debug lines

* remove esc50 refs

* fix mock tasks for multilabel

* fix mock tasks for multilabel

* Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch
embeddings-benchmark@4f23fdf

This reverts commit 4f23fdf, reversing
changes made to 1302477.

* fix model imports

* fqd50k cleaning

* update fsd50k

* change dataset

* eval subsets correctly

* make lint and remove debug statements

* clean print statements

* make lint

* update fsd2019 dataset

* remove init in AbsTaskAudioMultilabelClassification.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* add class parameters in AbsTaskAudioMultilabelClassification

* inherit from multilingualtask for FSD2019Kaggle

* make lint

* update mock_tasks; make lint

* remove train_split from fn parameters

* define fsd2019k to be multilingual

* inherit from MultilingualTask in fsd2019K

* fix tests

* inherit correct multingial task class

* remove MockAudioMultilabelClassificationLogRegTask

* rm other instances of MockAudioMultilabelClassificationLogRegTask

---------

Co-authored-by: rahulschand <rahulsc@stanford.edu>
Co-authored-by: Silky Singh <silky1708@gmail.com>
Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* init audio

* some encoder related changes

* some more abs task defs

Co-authored-by: rahulschand <rahulsc@stanford.edu>

* evaluators and classification

* remove rahul changes to generate first PR

* make lint

* init audio

* some encoder related changes

* some more abs task defs

Co-authored-by: rahulschand <rahulsc@stanford.edu>

* evaluators and classification

* remove rahul changes to generate first PR

* make lint

* add dataset/tasks skeleton

* readd changes lost in rebase

* add fsd50k

* add task categories for audio

* slight updates to fsd50k

* make lint

* wav2vec2 model

* add fsd50k metadata

* rename folder

* add metric

* add torchaudio in req

* reigster wav2vec2 models

* fixes

* add audio in valid task types

* mock interface changes

* my 0 shot

* make lint

* rm audio clustering

* wav2vec2 model revision update

* rm comment

* rm test.py

* add revisions to all wav2vec2 models

* rm empty abstask files

* rm empty evaluator files

* rm empty task files

* Update tests/test_tasks/test_all_abstasks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update mteb/models/wav2vec2_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* rm non-logReg evaluators for audio classification

* lint

* fn name changed to convert_audio_from_numpy

* rm mock tests for audio kNN classification

* rm evaluators for audio kNN classification

* fix imports

* fix audio kNN; make lint

* rm AbsTaskAudioClassification.py for later PR

* added zero-shot loading model and dataset checked

* remove commented code; reset changes to ClassificationEvaluator.py

* fix mock tasks for multilabel classification

* make lint

* inherit Wrapper class

* add all languages supported by wav2vec2

* make lint

* add script info to all languages

* make lint

* before cleaning comments

* ESC and clap model. Tested 81 percent zero-shot numbers

* fixed label names for ESC50-multilabel and removed comments

* recent changes

* merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets

* make lint remove uwanted files

* remove debug lines

* remove esc50 refs

* changes for debugging

* lint changes and maeb main branch merge

* fix mock tasks for multilabel

* fix mock tasks for multilabel

* Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch
embeddings-benchmark@4f23fdf

This reverts commit 4f23fdf, reversing
changes made to 1302477.

* fix model imports

* fqd50k cleaning

* fixed error in Image zero shot classfification

* update fsd50k

* change dataset

* eval subsets correctly

* make lint and remove debug statements

* clean print statements

* make lint

* update fsd2019 dataset

* remove init in AbsTaskAudioMultilabelClassification.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* add class parameters in AbsTaskAudioMultilabelClassification

* inherit from multilingualtask for FSD2019Kaggle

* make lint

* update mock_tasks; make lint

* remove train_split from fn parameters

* define fsd2019k to be multilingual

* inherit from MultilingualTask in fsd2019K

* fix tests

* inherit correct multingial task class

* remove MockAudioMultilabelClassificationLogRegTask

* rm other instances of MockAudioMultilabelClassificationLogRegTask

* removed unncessary files

* removed unncrssary files

* removed uncrssary files part 3

* deleted esc50 from multi label classification

* fixed errors

* fixed lintng, added precision and recall. Removed extra comments

* fixed double loading of model

* filled in missing meta-data

* fixed linting

---------

Co-authored-by: Animesh Jha <jha.animesh01@gmail.com>
Co-authored-by: rahulschand <rahulsc@stanford.edu>
Co-authored-by: Silky Singh <silky1708@gmail.com>
Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
…on task (embeddings-benchmark#2285)

* Added fsd50k dataset on huggingface

* added correct hf version of fsd50k dataset

* added correct hf version of fsd50k dataset

* removed extra imports

* removed unecessary load_data fn
* added large, music and speech clap models

* fixed public_training_data and removed training_datasets split

* added latest revision

* lowercase mit license

* fixed issue related to training_datasets

* fixed lint
AdnanElAssadi56 and others added 26 commits July 21, 2025 19:14
* MSCLAP Model

* typo

* type 2

* fixed audio emeddings

* audio handling

* fix float error

* move inputs to gpu

* device handling

* model to device

* device mismatch

* device

* text input to device

* text device mismatch fix

* Adding Variants

* lint + metadata fix

* lint
* wav2clip model

* metadata placeholders

* tensor-list mismatch

* audio-preprocessing

* tensor to numpy

* gpu oom fix

* typo + clean

* lint + metadata

* model name fix
* Added EmoVDB Retrieval Dataset

* Update __init__.py

* add Emotional Speech Retrieval

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* initial commit muq_mulan

* added revision

* Refactor audio processing in MuQMuLanWrapper to handle different audio input types. Updated tensor conversion logic for numpy arrays and lists, ensuring compatibility with existing torch tensor formats. Improved resampling handling for audio inputs.

* metadata + lint

* metadata update

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* Added HiFiTTS Retrieval Dataset

* remove dialect

* clean up metadata

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* Add MusicCaps dataset for audio retrieval tasks

* Update MusicCaps.py

* add Music Caption Retrieval

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* Added CMU_Arctic Retrieval Dataset

* Update CMU_Arctic.py
* MSCLAP Batch Implementation

* wav2clip batch implementation

* msclap fallback

* MuQ_Mulan Batch Implementation

* logging + fallbacks

* remove unnessary log + lint

* Update msclap_models.py
* update citation script

* Add audioset (WIP) (embeddings-benchmark#2331)

Added audioset draft commit

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* update audioset metadata

* add audioset mini

* use lrap

---------

Co-authored-by: Rahul C <chandrahul0320@gmail.com>
* cast batch to numpy and pad max_length

* trigger CI
* add the script to process commonvoice data

* add script to upload common voice data

* add dev data folder which was missing. Supress error from tarfile

* Add 'Speech Retrieval' for common voice T2A task

* add common voice 17 for temporary review

* add import common voice script in init file

* add a2t and t2a data transformation

* fixed class name, superclass and eval languages

* fixed linting errors and a tar file decompression error

* ruff reformat

* add common voice 21

* ruff reformat

* fixed the citation of task metadata

* ruff format

* fixed language code
* fleurs first commit

* ruff format fleurs

* fixed bibtex citation
* Truncation + Progress Bar

* ran lint

* Update muq_mulan_model.py
@Samoed
Copy link
Member

Samoed commented Sep 25, 2025

Can you remove MAEB commits?

@Samoed
Copy link
Member

Samoed commented Sep 25, 2025

Cherry-pick commits to #3214. Feel free to update new PR

@Samoed Samoed closed this Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.