added SpeechCommand dataset and Keyword spotting task#2329
added SpeechCommand dataset and Keyword spotting task#2329isaac-chung merged 12 commits intoembeddings-benchmark:maebfrom
Conversation
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks good few minor things.
Did you run a model on it? I could imagine that this task might be too easy.
mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py
Outdated
Show resolved
Hide resolved
mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py
Outdated
Show resolved
Hide resolved
mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py
Outdated
Show resolved
Hide resolved
mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py
Outdated
Show resolved
Hide resolved
No, haven't run the model yet. its still a draft PR. The dataset is large, ~70k audio files so it will take some time. |
No worries, just listing what is missing. Reducing the number of samples should make it more doable |
|
@RahulSChand the test set is ~3k so it seems manageable without downsampling. Want to also note that the zero shot abstask does not use For different version of the dataset, we could include both v1 and v2, by naming the class and the metadata name field like So if you have the bandwidth, it would be great to add v2 as well. Otherwise, I think it is also fine to merge it as is right now. |


Added google/speech-commands v1 dataset. Part of the larger #2319 issue list to add all clap models. This is a keyword spotting dataset, therefore added new type of task as well. Test results & Prompt logic in next comments
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.