Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech Commands v0.01 & v0.02 dataset #996

Merged
merged 10 commits into from
Mar 16, 2023

Conversation

yfyeung
Copy link
Contributor

@yfyeung yfyeung commented Mar 15, 2023

Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.
paper: https://arxiv.org/pdf/1804.03209.pdf

@yfyeung yfyeung force-pushed the speechcommand branch 2 times, most recently from 2d94ad4 to aff5912 Compare March 15, 2023 08:02
@yfyeung
Copy link
Contributor Author

yfyeung commented Mar 15, 2023

Hi, I am building a wake word recipe in icefall for this dataset.
Hope for your review of this data preparation part. @pzelasko

@desh2608
Copy link
Collaborator

Perhaps the recipe should be called "speech_commands" instead of "speech_commands001", and the version should be provided as a parameter for download/prepare.

@yfyeung
Copy link
Contributor Author

yfyeung commented Mar 15, 2023

Perhaps the recipe should be called "speech_commands" instead of "speech_commands001", and the version should be provided as a parameter for download/prepare.

Ok, I will implement this.

@yfyeung yfyeung changed the title Speech Commands v0.01 dataset Speech Commands v0.01 & v0.02 dataset Mar 15, 2023
@csukuangfj
Copy link
Contributor

Perhaps the recipe should be called "speech_commands" instead of "speech_commands001", and the version should be provided as a parameter for download/prepare.

In that case, I suggest that we support both v1 and v2.

@yfyeung
Copy link
Contributor Author

yfyeung commented Mar 15, 2023

Perhaps the recipe should be called "speech_commands" instead of "speech_commands001", and the version should be provided as a parameter for download/prepare.

In that case, I suggest that we support both v1 and v2.

Ok, I have implemented the recipe "speech_commands002" locally. I will merge v1 and v2 into one recipe.

@yfyeung yfyeung changed the title Speech Commands v0.01 & v0.02 dataset [WIP] Speech Commands v0.01 & v0.02 dataset Mar 15, 2023
@desh2608
Copy link
Collaborator

Thanks! You can look at the VoxCeleb recipe for an example where we support v1 and v2.

@yfyeung yfyeung changed the title [WIP] Speech Commands v0.01 & v0.02 dataset Speech Commands v0.01 & v0.02 dataset Mar 16, 2023
@yfyeung
Copy link
Contributor Author

yfyeung commented Mar 16, 2023

@desh2608 Hi, I have finished this recipe, supporting Speech Commands v0.01 and v0.02 dataset.
Hope for your review of this data preparation part.

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left two comments. Can you address them before we merge?

:return: the path to downloaded and extracted directory with data.
"""

return _download_speechcommands(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need v1 and v2 to have separate download functions even though they share the same logic? I think it's cleaner to rename _download_speechcommands into download_speechcommands and set the default version to the latest one, and then expose the version argument in the CLI (so that we have one CLI program for this rather than two).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

)


def _prepare_train_valid(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you split this function into separate prepare_train and prepare_valid? I don't think you really need a generator here if you restructure the code, and the way to use this as-is is a little confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will split it.

@yfyeung
Copy link
Contributor Author

yfyeung commented Mar 16, 2023

@pzelasko I have addressed them, please re check it.

@pzelasko
Copy link
Collaborator

LGTM!

@pzelasko pzelasko merged commit 7e8d6b0 into lhotse-speech:master Mar 16, 2023
@yfyeung yfyeung deleted the speechcommand branch March 17, 2023 01:14
@pzelasko pzelasko added this to the v1.13 milestone Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants