-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task #2082
base: maeb
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start. Let's remove the files that are not related to the linked issues.
I'm running test.py
but have not seen 429 errors yet. It is still running after loading the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the empty files for now, and check that each of these new task types have an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(base) silsingh@simurgh2:/vision/u/silsingh/mteb$ python test.py
mteb imported.
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-xls-r-300m and are newly initialized: ['lm_head.bias', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model loaded..
task loaded..
────────────────────────────────────── Selected tasks ──────────────────────────────────────
AudioMultilabelClassification
- FSD50K, a2a
ERROR:mteb.evaluation.MTEB:Error while evaluating FSD50K: [Errno 13] Permission denied: '/vision/u/silsingh/mteb/.cache/datasets/downloads/caa41240cab59989e9673a18571c95e36878d33daf8cf26672992c2922f1969d.lock'
Traceback (most recent call last):
File "/vision/u/silsingh/mteb/test.py", line 10, in <module>
results = evaluation.run(model)
^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/mteb/mteb/evaluation/MTEB.py", line 661, in run
raise e
File "/vision/u/silsingh/mteb/mteb/evaluation/MTEB.py", line 565, in run
task.load_data(**kwargs)
File "/vision/u/silsingh/mteb/mteb/abstasks/AbsTask.py", line 203, in load_data
self.dataset = datasets.load_dataset(**self.metadata_dict["dataset"]) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/load.py", line 2628, in load_dataset
builder_instance.download_and_prepare(
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/builder.py", line 1102, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/packaged_modules/folder_based_builder/folder_based_builder.py", line 117, in _split_generators
downloaded_files = dl_manager.download(files)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 257, in download
downloaded_path_or_paths = map_nested(
^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 512, in map_nested
_single_map_nested((function, obj, batched, batch_size, types, None, True, None))
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 380, in _single_map_nested
return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)]
^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 300, in _download_batched
return thread_map(
^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/tqdm/std.py", line 1169, in __iter__
for obj in iterable:
File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/vision/u/silsingh/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/download/download_manager.py", line 323, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 211, in cached_path
output_path = get_from_cache(
^^^^^^^^^^^^^^^
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 662, in get_from_cache
with FileLock(lock_path):
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_api.py", line 376, in __enter__
self.acquire()
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_api.py", line 332, in acquire
self._acquire()
File "/vision/u/silsingh/miniconda3/lib/python3.12/site-packages/filelock/_unix.py", line 42, in _acquire
fd = os.open(self.lock_file, open_flags, self._context.mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/vision/u/silsingh/mteb/.cache/datasets/downloads/caa41240cab59989e9673a18571c95e36878d33daf8cf26672992c2922f1969d.lock'
We were getting this issue when running the evaluation. We'll remove the redundant files
@@ -41,6 +41,7 @@ dependencies = [ | |||
"eval_type_backport>=0.0.0", | |||
"polars>=0.20.22", | |||
"torchvision>0.0.0", | |||
"torchaudio>0.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to specify a minimum version.
wav2vec2_xlsr_300m = ModelMeta( | ||
loader=partial(Wav2Vec2AudioWrapper, model_name="facebook/wav2vec2-xls-r-300m"), | ||
name="facebook/wav2vec2-xls-r-300m", | ||
languages=["multilingual"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please list all relevant languages. e.g. https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/e5_models.py#L9
Same for the ones below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

source: link
We noticed the languages in wav2vec2 models are shortened to just 2 characters, while https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/e5_models.py#L9 has 3 characters. Is there a way to map these correctly? for e.g., af could be mapped to afr_Latn but not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've previously used https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes as a reference to map manually. Maybe it's worth using deepseek (or others) map it for you then check its work.
def _convert_audio(self, audio: AudioData) -> torch.Tensor: | ||
if isinstance(audio, np.ndarray): | ||
audio = torch.from_numpy(audio) | ||
return audio.squeeze() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we always call .squeeze()
? If so, might be good to mention why in the docstrings.
self, | ||
model: AudioEncoder, | ||
eval_split: str = "test", | ||
train_split: str = "train", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like https://huggingface.co/datasets/Fhrozen/FSD50k does not have a train split. See comment below.
metadata = TaskMetadata( | ||
name="FSD50K", | ||
description="Multilabel Audio Classification.", | ||
reference="https://huggingface.co/datasets/Fhrozen/FSD50k", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest comparing it to https://github.com/edufonseca/FSD50K_baseline?tab=readme-ov-file and check whether it is the correct dataset implementation. If not, please upload a version from source and reference that one instead.
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to create classification class similar that in v2
branch https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/abstasks/AbsTaskClassification.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to create multilabelclassification class similar that in v2 branch https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/abstasks/AbsTaskMultilabelClassification.py
from mteb.model_meta import ModelMeta | ||
|
||
|
||
class Wav2Vec2AudioWrapper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you inherin from Wrapper class?
Implements #2071 #2066 #2070 #2056
Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)