Skip to content

Commit

Permalink
[TTS] Fix FastPitch data prep tutorial (NVIDIA#7524)
Browse files Browse the repository at this point in the history
Signed-off-by: Ryan <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
  • Loading branch information
rlangman authored and erastorgueva-nv committed Oct 11, 2023
1 parent 2f06a14 commit 6deeb50
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 7 deletions.
4 changes: 2 additions & 2 deletions scripts/dataset_processing/tts/preprocess_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
--input_manifest="<data_root_path>/manifest.json" \
--output_manifest="<data_root_path>/manifest_processed.json" \
--normalizer_config_path="<nemo_root_path>/examples/tts/conf/text/normalizer_en.yaml" \
--lower_case=True \
--lower_case \
--num_workers=4 \
--batch_size=16
--joblib_batch_size=16
"""

import argparse
Expand Down
14 changes: 9 additions & 5 deletions tutorials/tts/FastPitch_Data_Preparation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,8 @@
"lower_case = True\n",
"# Whether to overwrite output manifest, if it exists\n",
"overwrite_manifest = True\n",
"# Batch size for joblib parallelization. Increasing this value might speed up the script, depending on your CPU.\n",
"joblib_batch_size = 16\n",
"\n",
"# Python wrapper to invoke the given bash script with the given input args\n",
"def run_script(script, args):\n",
Expand All @@ -351,8 +353,10 @@
" f\"--output_manifest={output_filepath}\",\n",
" f\"--num_workers={num_workers}\",\n",
" f\"--normalizer_config_path={normalizer_config_filepath}\",\n",
" f\"--lower_case={lower_case}\"\n",
" f\"--joblib_batch_size={joblib_batch_size}\"\n",
" ]\n",
" if lower_case:\n",
" args.append(\"--lower_case\")\n",
" if overwrite_manifest:\n",
" args.append(\"--overwrite\")\n",
"\n",
Expand Down Expand Up @@ -787,7 +791,7 @@
"\n",
"We will train HiFi-GAN first so that we can use it to help evaluate the performance of FastPitch as it is being trained.\n",
"\n",
"HiFi-GAN training only requires a manifest with with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n",
"HiFi-GAN training only requires a manifest with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n",
"\n",
"Here we show how to train these models from scratch. You can also fine-tune them from pretrained checkpoints as mentioned in our [FastPitch fine-tuning tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_Finetuning.ipynb), but pretrained checkpoints compatible with these experimental recipes are not yet available on NGC.\n"
],
Expand Down Expand Up @@ -914,7 +918,7 @@
{
"cell_type": "code",
"source": [
"hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\"\n",
"hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\" / dataset_name\n",
"!ls $hifigan_log_epoch_dir"
],
"metadata": {
Expand Down Expand Up @@ -966,7 +970,7 @@
"1. Training manifest(s) with `audio_filepath` and `text` or `normalized_text` fields.\n",
"2. Precomputed features such as *pitch* and *energy* specified in the feature [config file](https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/feature/feature_44100.yaml).\n",
"3. (Optional) Statistics file for normalizing features.\n",
"4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field amd JSON file mapping speaker IDs to speaker indices.\n",
"4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field and JSON file mapping speaker IDs to speaker indices.\n",
"5. (Optional) To train with IPA phonemes, a [phoneme dictionary](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt) and optional [heteronyms file](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/heteronyms-052722)\n",
"6. (Optional) HiFi-GAN checkpoint or [NGC model name](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/models/hifigan.py#L413) for generating audio predictions during training.\n",
"\n"
Expand Down Expand Up @@ -1093,7 +1097,7 @@
{
"cell_type": "code",
"source": [
"faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\"\n",
"faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\" / dataset_name\n",
"!ls $faspitch_log_epoch_dir"
],
"metadata": {
Expand Down

0 comments on commit 6deeb50

Please sign in to comment.