Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xbmu amdo31 #902

Merged
merged 7 commits into from
Nov 29, 2022
Merged

Add xbmu amdo31 #902

merged 7 commits into from
Nov 29, 2022

Conversation

sendream
Copy link
Contributor

No description provided.

@@ -59,3 +59,4 @@
from .voxceleb import *
from .wenet_speech import *
from .yesno import *
from .xbmu_amdo31 import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please sort the import alphabetically, i.e.,

from .xbmu_amdo31 import *
from .yesno import *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -0,0 +1,126 @@
"""
About the XBMU-AMDO31 corpus
XBMU-AMDO31 is an open-source Amdo Tibetan speech corpus published by Northwest Minzu University.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some more details about the corpus? It seems the HF link does not contain any description either. You can refer to the other recipes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, HF will update today with more descriptions and add more details to this part of the code.

"""
Downdload and untar the dataset
:param target_dir: Pathlike, the path of the dir to storage the dataset.
:param force_download: Bool, if True, download the tars no matter if the tars exist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The force_download and base_url arguments are not present in the function definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code will be updated today or tomorrow, and the downloadable corpus code in icefall will be removed and added to lhotse.

:param base_url: str, the url of the OpenSLR resources.
:return: the path to downloaded and extracted directory with data.
"""
target_dir = Path(target_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems here you are assuming that a tar.gz file is already downloaded in the target_dir? If so, please add some message here in case these are not present. For example, does the user need to download these files manually?


manifests[part] = {"recordings": recording_set, "supervisions": supervision_set}

return manifests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should have a newline at end of file to avoid flake8 error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder

@pzelasko
Copy link
Collaborator

Is it good to merge? @desh2608

@desh2608
Copy link
Collaborator

LGTM. @sendream are you planning to make more changes or should I merge?

@sendream
Copy link
Contributor Author

LGTM. @sendream are you planning to make more changes or should I merge?

Hi, no more changes, you should merge.

@desh2608 desh2608 added this to the v1.11 milestone Nov 29, 2022
@desh2608 desh2608 merged commit b62acf9 into lhotse-speech:master Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants