[TTS] Fix FastPitch data prep tutorial (NVIDIA#7524)

Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]>
erastorgueva-nv · Oct 11, 2023 · 6deeb50 · 6deeb50
1 parent 2f06a14
commit 6deeb50
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 7 deletions.
diff --git a/scripts/dataset_processing/tts/preprocess_text.py b/scripts/dataset_processing/tts/preprocess_text.py
@@ -22,9 +22,9 @@
     --input_manifest="<data_root_path>/manifest.json" \
     --output_manifest="<data_root_path>/manifest_processed.json" \
     --normalizer_config_path="<nemo_root_path>/examples/tts/conf/text/normalizer_en.yaml" \
-    --lower_case=True \
+    --lower_case \
     --num_workers=4 \
-    --batch_size=16
+    --joblib_batch_size=16
 """
 
 import argparse

diff --git a/tutorials/tts/FastPitch_Data_Preparation.ipynb b/tutorials/tts/FastPitch_Data_Preparation.ipynb
@@ -332,6 +332,8 @@
         "lower_case = True\n",
         "# Whether to overwrite output manifest, if it exists\n",
         "overwrite_manifest = True\n",
+        "# Batch size for joblib parallelization. Increasing this value might speed up the script, depending on your CPU.\n",
+        "joblib_batch_size = 16\n",
         "\n",
         "# Python wrapper to invoke the given bash script with the given input args\n",
         "def run_script(script, args):\n",
@@ -351,8 +353,10 @@
         "        f\"--output_manifest={output_filepath}\",\n",
         "        f\"--num_workers={num_workers}\",\n",
         "        f\"--normalizer_config_path={normalizer_config_filepath}\",\n",
-        "        f\"--lower_case={lower_case}\"\n",
+        "        f\"--joblib_batch_size={joblib_batch_size}\"\n",
         "    ]\n",
+        "    if lower_case:\n",
+        "      args.append(\"--lower_case\")\n",
         "    if overwrite_manifest:\n",
         "        args.append(\"--overwrite\")\n",
         "\n",
@@ -787,7 +791,7 @@
         "\n",
         "We will train HiFi-GAN first so that we can use it to help evaluate the performance of FastPitch as it is being trained.\n",
         "\n",
-        "HiFi-GAN training only requires a manifest with with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n",
+        "HiFi-GAN training only requires a manifest with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n",
         "\n",
         "Here we show how to train these models from scratch. You can also fine-tune them from pretrained checkpoints as mentioned in our [FastPitch fine-tuning tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_Finetuning.ipynb), but pretrained checkpoints compatible with these experimental recipes are not yet available on NGC.\n"
       ],
@@ -914,7 +918,7 @@
     {
       "cell_type": "code",
       "source": [
-        "hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\"\n",
+        "hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\" / dataset_name\n",
         "!ls $hifigan_log_epoch_dir"
       ],
       "metadata": {
@@ -966,7 +970,7 @@
         "1. Training manifest(s) with `audio_filepath` and `text` or `normalized_text` fields.\n",
         "2. Precomputed features such as *pitch* and *energy* specified in the feature [config file](https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/feature/feature_44100.yaml).\n",
         "3. (Optional) Statistics file for normalizing features.\n",
-        "4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field amd JSON file mapping speaker IDs to speaker indices.\n",
+        "4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field and JSON file mapping speaker IDs to speaker indices.\n",
         "5. (Optional) To train with IPA phonemes, a [phoneme dictionary](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt) and optional [heteronyms file](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/heteronyms-052722)\n",
         "6. (Optional) HiFi-GAN checkpoint or [NGC model name](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/models/hifigan.py#L413) for generating audio predictions during training.\n",
         "\n"
@@ -1093,7 +1097,7 @@
     {
       "cell_type": "code",
       "source": [
-        "faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\"\n",
+        "faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\" / dataset_name\n",
         "!ls $faspitch_log_epoch_dir"
       ],
       "metadata": {