Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082

anime-sh · 2025-02-17T06:35:35Z

Implements #2071 #2066 #2070 #2056

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

[?] I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
[?] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Co-authored-by: rahulschand <[email protected]>

isaac-chung

Great start. Let's remove the files that are not related to the linked issues.
I'm running test.py but have not seen 429 errors yet. It is still running after loading the model.

isaac-chung · 2025-02-21T06:50:12Z

mteb/abstasks/Audio/AbsTaskAudioEventDetection.py

Let's remove the empty files for now, and check that each of these new task types have an issue.

(base) silsingh@simurgh2:/vision/u/silsingh/mteb$ python test.py mteb imported. Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-xls-r-300m and are newly initialized: ['lm_head.bias', 'lm_head.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. model loaded.. task loaded.. ────────────────────────────────────── Selected tasks ────────────────────────────────────── AudioMultilabelClassification - FSD50K, a2a ERROR:mteb.evaluation.MTEB:Error while evaluating FSD50K: [Errno 13] Permission denied: '/vision/u/silsingh/mteb/.cache/datasets/downloads/caa41240cab59989e9673a18571c95e36878d33daf8cf26672992c2922f1969d.lock' Traceback (most recent call last): File "/vision/u/silsingh/mteb/test.py", line 10, in <module> results = evaluation.run(model) ^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/mteb/mteb/evaluation/MTEB.py", line 661, in run raise e File "/vision/u/silsingh/mteb/mteb/evaluation/MTEB.py", line 565, in run task.load_data(**kwargs) File "/vision/u/silsingh/mteb/mteb/abstasks/AbsTask.py", line 203, in load_data self.dataset = datasets.load_dataset(**self.metadata_dict["dataset"]) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/load.py", line 2628, in load_dataset builder_instance.download_and_prepare( File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1029, in download_and_prepare self._download_and_prepare( File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1791, in _download_and_prepare super()._download_and_prepare( File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1102, in _download_and_prepare split_generators = self._split_generators(dl_manager, **split_generators_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/packaged_modules/folder_based_builder/folder_based_builder.py", line 117, in _split_generators downloaded_files = dl_manager.download(files) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 257, in download downloaded_path_or_paths = map_nested( ^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 512, in map_nested _single_map_nested((function, obj, batched, batch_size, types, None, True, None)) File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 380, in _single_map_nested return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)] ^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 300, in _download_batched return thread_map( ^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/std.py", line 1169, in __iter__ for obj in iterable: File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 323, in _download_single out = cached_path(url_or_filename, download_config=download_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 211, in cached_path output_path = get_from_cache( ^^^^^^^^^^^^^^^ File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 662, in get_from_cache with FileLock(lock_path): File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_api.py", line 376, in __enter__ self.acquire() File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_api.py", line 332, in acquire self._acquire() File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_unix.py", line 42, in _acquire fd = os.open(self.lock_file, open_flags, self._context.mode) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 13] Permission denied: '/vision/u/silsingh/mteb/.cache/datasets/downloads/caa41240cab59989e9673a18571c95e36878d33daf8cf26672992c2922f1969d.lock'

We were getting this issue when running the evaluation. We'll remove the redundant files

mteb/abstasks/TaskMetadata.py

isaac-chung · 2025-02-21T06:54:20Z

pyproject.toml

@@ -41,6 +41,7 @@ dependencies = [
    "eval_type_backport>=0.0.0",
    "polars>=0.20.22",
    "torchvision>0.0.0",
+    "torchaudio>0.0.0",


Let's try to specify a minimum version.

isaac-chung · 2025-02-21T06:57:05Z

mteb/models/wav2vec2_models.py

+wav2vec2_xlsr_300m = ModelMeta(
+    loader=partial(Wav2Vec2AudioWrapper, model_name="facebook/wav2vec2-xls-r-300m"),
+    name="facebook/wav2vec2-xls-r-300m",
+    languages=["multilingual"],


Please list all relevant languages. e.g. https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/e5_models.py#L9

Same for the ones below.

source: link

We noticed the languages in wav2vec2 models are shortened to just 2 characters, while https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/e5_models.py#L9 has 3 characters. Is there a way to map these correctly? for e.g., af could be mapped to afr_Latn but not sure

I've previously used https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes as a reference to map manually. Maybe it's worth using deepseek (or others) map it for you then check its work.

mteb/models/wav2vec2_models.py

mteb/evaluation/evaluators/Audio/ClassificationEvaluator.py

mteb/models/wav2vec2_models.py

isaac-chung · 2025-02-21T07:11:04Z

mteb/models/wav2vec2_models.py

+    def _convert_audio(self, audio: AudioData) -> torch.Tensor:
+        if isinstance(audio, np.ndarray):
+            audio = torch.from_numpy(audio)
+        return audio.squeeze()


Would we always call .squeeze()? If so, might be good to mention why in the docstrings.

test.py

mteb/abstasks/Audio/AbsTaskAudioClassification.py

isaac-chung · 2025-02-21T07:47:41Z

mteb/abstasks/Audio/AbsTaskAudioMultilabelClassification.py

+        self,
+        model: AudioEncoder,
+        eval_split: str = "test",
+        train_split: str = "train",


Looks like https://huggingface.co/datasets/Fhrozen/FSD50k does not have a train split. See comment below.

isaac-chung · 2025-02-21T07:47:59Z

mteb/tasks/Audio/AudioMultilabelClassification/eng/FSD50K.py

+    metadata = TaskMetadata(
+        name="FSD50K",
+        description="Multilabel Audio Classification.",
+        reference="https://huggingface.co/datasets/Fhrozen/FSD50k",


I'd suggest comparing it to https://github.com/edufonseca/FSD50K_baseline?tab=readme-ov-file and check whether it is the correct dataset implementation. If not, please upload a version from source and reference that one instead.

Co-authored-by: Isaac Chung <[email protected]>

Samoed · 2025-02-21T08:11:16Z

mteb/abstasks/Audio/AbsTaskAudioClassification.py

I think it would be better to create classification class similar that in v2 branch https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/abstasks/AbsTaskClassification.py

Samoed · 2025-02-21T08:13:14Z

mteb/abstasks/Audio/AbsTaskAudioMultilabelClassification.py

I think it would be better to create multilabelclassification class similar that in v2 branch https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/abstasks/AbsTaskMultilabelClassification.py

Samoed · 2025-02-21T08:13:59Z

mteb/models/wav2vec2_models.py

+from mteb.model_meta import ModelMeta
+
+
+class Wav2Vec2AudioWrapper:


Can you inherin from Wrapper class?

anime-sh changed the title ~~Init MAEB [WIP]~~ [WIP] Init MAEB Feb 17, 2025

anime-sh marked this pull request as draft February 17, 2025 06:39

This was linked to issues Feb 18, 2025

Define an encoder interface for audio #2071

Open

Create audio classification AbsTask and Evaluator #2066

Open

Create multilabel audio classification AbsTask and Evaluator #2070

Open

isaac-chung mentioned this pull request Feb 19, 2025

Create audio clustering AbsTask and Evaluator #2093

Open

anime-sh and others added 7 commits February 20, 2025 19:00

init audio

c5744bf

some encoder related changes

64ccf50

some more abs task defs

1a744c0

Co-authored-by: rahulschand <[email protected]>

evaluators and classification

c26ebae

remove rahul changes to generate first PR

1289d9b

make lint

bb2b4d0

add dataset/tasks skeleton

705664e

anime-sh force-pushed the maeb branch from bb73ae8 to 705664e Compare February 21, 2025 03:06

anime-sh and others added 8 commits February 20, 2025 19:17

readd changes lost in rebase

07eda3c

add fsd50k

ebae179

add task categories for audio

d51c5d1

slight updates to fsd50k

e3b89fa

make lint

849323c

wav2vec2 model

395b833

add fsd50k metadata

efd7095

rename folder

f97f9a3

anime-sh changed the title ~~[WIP] Init MAEB~~ Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task Feb 21, 2025

anime-sh marked this pull request as ready for review February 21, 2025 03:48

silky1708 and others added 6 commits February 20, 2025 19:48

add metric

6d61f3a

add torchaudio in req

fa61ea6

reigster wav2vec2 models

b03a28f

fixes

e4aaf9d

add audio in valid task types

d3c20a0

Merge branch 'maeb' of https://github.com/anime-sh/mteb into maeb

20a45ad

mock interface changes

c92073a

isaac-chung self-requested a review February 21, 2025 06:36

make lint

63bfaed

isaac-chung reviewed Feb 21, 2025

View reviewed changes

rm audio clustering

2868359

isaac-chung reviewed Feb 21, 2025

View reviewed changes

silky1708 and others added 9 commits February 20, 2025 23:50

wav2vec2 model revision update

17949a0

rm comment

1865f84

rm test.py

3ad782e

add revisions to all wav2vec2 models

1ce34ac

rm empty abstask files

cb57565

rm empty evaluator files

792fef3

rm empty task files

fdd8935

Update tests/test_tasks/test_all_abstasks.py

8def584

Co-authored-by: Isaac Chung <[email protected]>

Update mteb/models/wav2vec2_models.py

26b8b7f

Co-authored-by: Isaac Chung <[email protected]>

Samoed reviewed Feb 21, 2025

View reviewed changes

silky1708 added 8 commits February 21, 2025 00:17

rm non-logReg evaluators for audio classification

c256ac6

lint

2379f16

fn name changed to convert_audio_from_numpy

babba47

rm mock tests for audio kNN classification

8b5f25b

rm evaluators for audio kNN classification

64aeb3f

fix imports

6b6ef78

fix audio kNN; make lint

35bf99a

rm AbsTaskAudioClassification.py for later PR

2977dd3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082

Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082

anime-sh commented Feb 17, 2025 •

edited

Loading

isaac-chung left a comment

isaac-chung Feb 21, 2025

silky1708 Feb 21, 2025

isaac-chung Feb 21, 2025

isaac-chung Feb 21, 2025

silky1708 Feb 21, 2025 •

edited

Loading

isaac-chung Feb 21, 2025

isaac-chung Feb 21, 2025

isaac-chung Feb 21, 2025

isaac-chung Feb 21, 2025

Samoed Feb 21, 2025

Samoed Feb 21, 2025

Samoed Feb 21, 2025

		from mteb.model_meta import ModelMeta


		class Wav2Vec2AudioWrapper:

Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082

Are you sure you want to change the base?

Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082

Conversation

anime-sh commented Feb 17, 2025 • edited Loading

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

isaac-chung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

silky1708 Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anime-sh commented Feb 17, 2025 •

edited

Loading

silky1708 Feb 21, 2025 •

edited

Loading