[TTS][DE] Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German #4413

XuesongYang · 2022-06-21T21:46:13Z

What does this PR do ?

created a dataset preparation recipe for HUI-ACG.
- re-write get_data.py to support multiprocessing and decouple text normalization.
added a script to convert graphemes into phonemes.
created a model config for multi-speaker Fastpitch model training.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Download the dataset, normalize texts, create train/val/test splits of JSON manifests.
$ python scripts/dataset_processing/tts/hui_acg/get_data.py \
        --data-root /home/xueyang/datasets \
        --normalize-text
$ ls /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/*_text_normed.json
test_manifest_text_normed.json  train_manifest_text_normed.json  val_manifest_text_normed.json

# run phonemizer to convert graphemes into phonemes, resulting in new files with name like train_manifest_text_normed_phonemes.json
$ python scripts/dataset_processing/tts/hui_acg/phonemizer.py \
        --json-manifests /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed.json \
                                     /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/val_manifest_text_normed.json \
                                     /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/test_manifest_text_normed.json \
        --preserve-punctuation
$ ls /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/*_phonemes.json
test_manifest_text_normed_phonemes.json  train_manifest_text_normed_phonemes.json  val_manifest_text_normed_phonemes.json

$ python scripts/dataset_processing/tts/extract_sup_data.py \
        --config-path hui_acg/ds_conf \
        --config-name ds_for_fastpitch_align.yaml \
        manifest_filepath=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/Friedrich_Clean/train_manifest_text_normed_phonemes.json \
        sup_data_path=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/sup_data_phonemes

# you would see the information in the terminal
-- Set up ENVs...
a867b5ef4cd08fe0db216937b2d3e767fe71b3e7
-- Started ...
[NeMo I 2022-06-17 15:15:57 tokenize_and_classify:81] Creating ClassifyFst grammars. This might take some time...
[NeMo I 2022-06-17 15:16:15 data:186] Loading dataset from /data/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json.
[NeMo I 2022-06-17 15:16:22 data:221] Loaded dataset with 173622 files.
[NeMo I 2022-06-17 15:16:22 data:223] Dataset contains 403.30 hours.
[NeMo I 2022-06-17 15:16:22 data:321] Pruned 0 files. Final dataset contains 173622 files
[NeMo I 2022-06-17 15:16:22 data:323] Pruned 0.00 hours. Final dataset contains 403.30 hours.
Processing /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json:
PITCH_MEAN=155.3916473388672, PITCH_STD=102.26656341552734
PITCH_MIN=65.4063949584961, PITCH_MAX=2093.004638671875

# train FastPitch model. copy & paste PITCH_MEAN, PITCH_STD, PITCH_MIN, and PITCH_MAX to the params.
$ python examples/tts/fastpitch.py \
        --config-path conf/de \
        --config-name fastpitch_align_44100 \
        model.train_ds.dataloader_params.num_workers=16 \
        model.validation_ds.dataloader_params.num_workers=16 \
        model.train_ds.dataloader_params.batch_size=24 \
        model.validation_ds.dataloader_params.batch_size=24 \
        train_dataset=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json \
        validation_datasets=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/val_manifest_text_normed_phonemes.json \
        sup_data_path=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/sup_data_phonemes \
        pitch_mean=155.3916473388672 \
        pitch_std=102.26656341552734 \
        pitch_fmin=65.4063949584961 \
        pitch_fmax=2093.004638671875 \
        trainer.devices=2 \
        trainer.num_nodes=1 \
        trainer.accumulate_grad_batches=2 \
        trainer.log_every_n_steps=10 \
        exp_manager.exp_dir=/home/xueyang/experiments/GermanTTS/multi_spk_tts_de/fastpitch-train/results \
        exp_manager.resume_if_exists=True \
        exp_manager.resume_ignore_no_checkpoint=True \
        +exp_manager.create_wandb_logger=True \
        +exp_manager.wandb_logger_kwargs.name=fastpitch-train \
        +exp_manager.wandb_logger_kwargs.project=multi_spk_tts_de

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

scripts/dataset_processing/tts/hui_acg/get_data.py

scripts/dataset_processing/tts/hui_acg/phonemizer_local.py

scripts/dataset_processing/tts/hui_acg/ds_conf/ds_for_fastpitch_align.yaml

scripts/dataset_processing/tts/hui_acg/phonemizer_local.py

* modify get_data to support multiple speaker IDs. * created a dataset config for HUI-ACG. * created a model config for fastpitch German. Signed-off-by: Xuesong Yang <[email protected]>

* modify get_data to support multiple speaker IDs. * created a dataset config for HUI-ACG. * created a model config for fastpitch German. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: arendu <[email protected]>

* modify get_data to support multiple speaker IDs. * created a dataset config for HUI-ACG. * created a model config for fastpitch German. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]>

* modify get_data to support multiple speaker IDs. * created a dataset config for HUI-ACG. * created a model config for fastpitch German. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]>

XuesongYang requested review from aroraakshit, redoctopus and blisc June 21, 2022 21:46

XuesongYang marked this pull request as ready for review June 21, 2022 21:46

XuesongYang changed the title ~~[TTS][DE] Multi-speaker fastpitch model training recipe.~~ [TTS][DE] Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German Jun 21, 2022

XuesongYang requested a review from borisgin June 21, 2022 23:06

XuesongYang mentioned this pull request Jun 21, 2022

[fastpitch][DE] HUI dataset preparation config and model training config on 44100 Hz audios. #4241

Closed

8 tasks