added SpeechCommand dataset and Keyword spotting task by RahulSChand · Pull Request #2329 · embeddings-benchmark/mteb

RahulSChand · 2025-03-11T18:54:34Z

Added google/speech-commands v1 dataset. Part of the larger #2319 issue list to add all clap models. This is a keyword spotting dataset, therefore added new type of task as well. Test results & Prompt logic in next comments

Code Quality

[:heavy_check_mark: ] Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

[:heavy_check_mark: ] Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

[ ✔️ ] I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
[:heavy_check_mark: ] I have filled out the metadata object in the dataset file (find documentation on it here).
[:heavy_check_mark: ] Run tests locally to make sure nothing is broken using make test.
[:heavy_check_mark: ] Run the formatter to format the code using make lint.

RahulSChand · 2025-03-11T18:55:41Z

Prompt logic for key-word spotting for zero-shot is to just use the label as text with no additional prefix, unlike in other cases where we use stuff like "this is a sound of .." (from the clap paper)

KennethEnevoldsen

Looks good few minor things.

Did you run a model on it? I could imagine that this task might be too easy.

mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py

RahulSChand · 2025-03-11T19:25:50Z

Looks good few minor things.

Did you run a model on it? I could imagine that this task might be too easy.

No, haven't run the model yet. its still a draft PR. The dataset is large, ~70k audio files so it will take some time.

KennethEnevoldsen · 2025-03-11T19:33:05Z

No, haven't run the model yet. its still a draft PR.

No worries, just listing what is missing. Reducing the number of samples should make it more doable

RahulSChand · 2025-03-15T00:20:35Z

No, haven't run the model yet. its still a draft PR.

No worries, just listing what is missing. Reducing the number of samples should make it more doable

Tested with samples_per_label=8. For additional context, the dataset has 30 labels in v1.1 but only the first 10 are considered. commands the rest are considered auxiliary labels. Below is from the offiical huggingface repo

In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature).

The v2 version has total of 13 commands (v2 has not been added yet). The results below (which are on the v1.1 verson of the dataset) are close to the one in the ofifical clap paper (ofifical paper gets 10.63% on the v2 version of the dataset)

I feel one thing that can be done in this PR is to go directly to the v2 version of the dataset which is now used for benchmarking speech commands and not add the v1 dataset since its no longer used anymore. What do you think?

mteb/abstasks/Audio/AbsTaskAudioZeroshotClassification.py

isaac-chung · 2025-06-11T20:55:46Z

@RahulSChand the test set is ~3k so it seems manageable without downsampling. Want to also note that the zero shot abstask does not use samples_per_label -> no training is done.

For different version of the dataset, we could include both v1 and v2, by naming the class and the metadata name field like SpeechCommandsZeroshotv1. The file would have 2 classes, one for each version. This way, both are supported, and we do not necessarily include both in the benchmark - we'll have a choice.

So if you have the bandwidth, it would be great to add v2 as well. Otherwise, I think it is also fine to merge it as is right now.

added SpeechCommand dataset and Keyword spotting task

9b675d4

RahulSChand self-assigned this Mar 11, 2025

RahulSChand marked this pull request as draft March 11, 2025 18:54

RahulSChand requested a review from KennethEnevoldsen March 11, 2025 18:59

RahulSChand added the audio Audio extension label Mar 11, 2025

KennethEnevoldsen reviewed Mar 11, 2025

View reviewed changes

mteb/tasks/Audio/AudioZeroshotClassification/eng/SpeechCommands.py Show resolved Hide resolved

RahulSChand changed the title ~~added SpeechCommand dataset and Keyword spotting task~~ added SpeechCommand dataset and Keyword spotting task (WIP) Mar 11, 2025

silky1708 linked an issue Mar 13, 2025 that may be closed by this pull request

Add Speech Commands dataset #2244

Closed

RahulSChand added 5 commits March 14, 2025 14:16

some bejing

7f71c19

speech command changes + others

cc1e28e

removed uncessary files

1e74445

removed uncessary files

a0ddd5e

fixed label logic

48c7b6c

RahulSChand changed the title ~~added SpeechCommand dataset and Keyword spotting task (WIP)~~ added SpeechCommand dataset and Keyword spotting task Mar 15, 2025

RahulSChand marked this pull request as ready for review March 15, 2025 00:24

Samoed reviewed Mar 15, 2025

View reviewed changes

mteb/abstasks/Audio/AbsTaskAudioZeroshotClassification.py Outdated Show resolved Hide resolved

silky1708 mentioned this pull request May 7, 2025

MAEB Overview Issue #2072

Closed

84 tasks

isaac-chung added 4 commits June 11, 2025 22:54

Merge branch 'maeb' into speech_commands

753ee3c

fix metadata and correct abstask

a9fc780

correct esc50 category

0b6b775

remove unused param

c645461

isaac-chung added 2 commits June 21, 2025 11:28

add v2 as well

729a3ba

Merge branch 'maeb' into speech_commands

f67042b

isaac-chung merged commit 6bc4c5a into embeddings-benchmark:maeb Jun 21, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added SpeechCommand dataset and Keyword spotting task#2329

added SpeechCommand dataset and Keyword spotting task#2329
isaac-chung merged 12 commits intoembeddings-benchmark:maebfrom
anime-sh:speech_commands

RahulSChand commented Mar 11, 2025 •

edited by KennethEnevoldsen

Loading

Uh oh!

RahulSChand commented Mar 11, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RahulSChand commented Mar 11, 2025

Uh oh!

KennethEnevoldsen commented Mar 11, 2025

Uh oh!

RahulSChand commented Mar 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

isaac-chung commented Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RahulSChand commented Mar 11, 2025 • edited by KennethEnevoldsen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding datasets checklist

Uh oh!

RahulSChand commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RahulSChand commented Mar 11, 2025

Uh oh!

KennethEnevoldsen commented Mar 11, 2025

Uh oh!

RahulSChand commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

isaac-chung commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RahulSChand commented Mar 11, 2025 •

edited by KennethEnevoldsen

Loading

RahulSChand commented Mar 11, 2025 •

edited

Loading

RahulSChand commented Mar 15, 2025 •

edited

Loading

isaac-chung commented Jun 11, 2025 •

edited

Loading