-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HuggingFace Datasets to NeMo ASR Dataset script #3513
Conversation
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
This pull request introduces 2 alerts when merging 229a8b3 into 3146fca - view on LGTM.com new alerts:
|
Signed-off-by: smajumdar <[email protected]>
This pull request introduces 2 alerts when merging 10821e2 into d8354a2 - view on LGTM.com new alerts:
|
## Usage - Offline Mode | ||
|
||
python convert_hf_dataset_to_nemo.py \ | ||
output_dir=<Path to some storage drive that will holde preprocessed audio files> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: holde
=> hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if not os.path.exists(cfg.split_output_dir): | ||
os.makedirs(cfg.split_output_dir, exist_ok=True) | ||
|
||
cfg.split = split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line should be moved out of else
scope (delete extra tab).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import tqdm | ||
from datasets import Audio, IterableDataset, load_dataset | ||
from hydra.core.config_store import ConfigStore | ||
from omegaconf import OmegaConf, open_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open_dict
is unused (according to LGTM check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for catching that. ended up not needing it.
Signed-off-by: smajumdar <[email protected]>
This pull request introduces 1 alert when merging 5d59252 into 101977e - view on LGTM.com new alerts:
|
Signed-off-by: smajumdar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
* First draft Signed-off-by: smajumdar <[email protected]> * Temp commit Signed-off-by: smajumdar <[email protected]> * Prepare HF to NeMo dataset preparation script Signed-off-by: smajumdar <[email protected]> * Improve conversion framework Signed-off-by: smajumdar <[email protected]> * Finalize HF dataset to NeMo ASR Signed-off-by: smajumdar <[email protected]> * Finalize HF dataset to NeMo ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Fixed dangling variable Signed-off-by: smajumdar <[email protected]> Co-authored-by: Vitaly Lavrukhin <[email protected]>
* First draft Signed-off-by: smajumdar <[email protected]> * Temp commit Signed-off-by: smajumdar <[email protected]> * Prepare HF to NeMo dataset preparation script Signed-off-by: smajumdar <[email protected]> * Improve conversion framework Signed-off-by: smajumdar <[email protected]> * Finalize HF dataset to NeMo ASR Signed-off-by: smajumdar <[email protected]> * Finalize HF dataset to NeMo ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Fixed dangling variable Signed-off-by: smajumdar <[email protected]> Co-authored-by: Vitaly Lavrukhin <[email protected]>
Changelog
NOTE: