Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS][DE] Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German #4413

Merged
merged 1 commit into from
Jun 30, 2022

Conversation

XuesongYang
Copy link
Collaborator

@XuesongYang XuesongYang commented Jun 21, 2022

What does this PR do ?

  • created a dataset preparation recipe for HUI-ACG.
    • re-write get_data.py to support multiprocessing and decouple text normalization.
  • added a script to convert graphemes into phonemes.
  • created a model config for multi-speaker Fastpitch model training.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Download the dataset, normalize texts, create train/val/test splits of JSON manifests.
$ python scripts/dataset_processing/tts/hui_acg/get_data.py \
        --data-root /home/xueyang/datasets \
        --normalize-text
$ ls /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/*_text_normed.json
test_manifest_text_normed.json  train_manifest_text_normed.json  val_manifest_text_normed.json

# run phonemizer to convert graphemes into phonemes, resulting in new files with name like train_manifest_text_normed_phonemes.json
$ python scripts/dataset_processing/tts/hui_acg/phonemizer.py \
        --json-manifests /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed.json \
                                     /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/val_manifest_text_normed.json \
                                     /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/test_manifest_text_normed.json \
        --preserve-punctuation
$ ls /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/*_phonemes.json
test_manifest_text_normed_phonemes.json  train_manifest_text_normed_phonemes.json  val_manifest_text_normed_phonemes.json

$ python scripts/dataset_processing/tts/extract_sup_data.py \
        --config-path hui_acg/ds_conf \
        --config-name ds_for_fastpitch_align.yaml \
        manifest_filepath=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/Friedrich_Clean/train_manifest_text_normed_phonemes.json \
        sup_data_path=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/sup_data_phonemes

# you would see the information in the terminal
-- Set up ENVs...
a867b5ef4cd08fe0db216937b2d3e767fe71b3e7
-- Started ...
[NeMo I 2022-06-17 15:15:57 tokenize_and_classify:81] Creating ClassifyFst grammars. This might take some time...
[NeMo I 2022-06-17 15:16:15 data:186] Loading dataset from /data/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json.
[NeMo I 2022-06-17 15:16:22 data:221] Loaded dataset with 173622 files.
[NeMo I 2022-06-17 15:16:22 data:223] Dataset contains 403.30 hours.
[NeMo I 2022-06-17 15:16:22 data:321] Pruned 0 files. Final dataset contains 173622 files
[NeMo I 2022-06-17 15:16:22 data:323] Pruned 0.00 hours. Final dataset contains 403.30 hours.
Processing /home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json:
PITCH_MEAN=155.3916473388672, PITCH_STD=102.26656341552734
PITCH_MIN=65.4063949584961, PITCH_MAX=2093.004638671875

# train FastPitch model. copy & paste PITCH_MEAN, PITCH_STD, PITCH_MIN, and PITCH_MAX to the params.
$ python examples/tts/fastpitch.py \
        --config-path conf/de \
        --config-name fastpitch_align_44100 \
        model.train_ds.dataloader_params.num_workers=16 \
        model.validation_ds.dataloader_params.num_workers=16 \
        model.train_ds.dataloader_params.batch_size=24 \
        model.validation_ds.dataloader_params.batch_size=24 \
        train_dataset=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/train_manifest_text_normed_phonemes.json \
        validation_datasets=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/val_manifest_text_normed_phonemes.json \
        sup_data_path=/home/xueyang/datasets/HUI-Audio-Corpus-German-clean/sup_data_phonemes \
        pitch_mean=155.3916473388672 \
        pitch_std=102.26656341552734 \
        pitch_fmin=65.4063949584961 \
        pitch_fmax=2093.004638671875 \
        trainer.devices=2 \
        trainer.num_nodes=1 \
        trainer.accumulate_grad_batches=2 \
        trainer.log_every_n_steps=10 \
        exp_manager.exp_dir=/home/xueyang/experiments/GermanTTS/multi_spk_tts_de/fastpitch-train/results \
        exp_manager.resume_if_exists=True \
        exp_manager.resume_ignore_no_checkpoint=True \
        +exp_manager.create_wandb_logger=True \
        +exp_manager.wandb_logger_kwargs.name=fastpitch-train \
        +exp_manager.wandb_logger_kwargs.project=multi_spk_tts_de 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@XuesongYang XuesongYang marked this pull request as ready for review June 21, 2022 21:46
@XuesongYang XuesongYang changed the title [TTS][DE] Multi-speaker fastpitch model training recipe. [TTS][DE] Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German Jun 21, 2022
@XuesongYang XuesongYang force-pushed the xueyang-tts-fastpitch-German branch 2 times, most recently from 38f817d to f898995 Compare June 26, 2022 06:19
* modify get_data to support multiple speaker IDs.
* created a dataset config for HUI-ACG.
* created a model config for fastpitch German.

Signed-off-by: Xuesong Yang <[email protected]>
@XuesongYang XuesongYang merged commit fac634c into main Jun 30, 2022
@XuesongYang XuesongYang deleted the xueyang-tts-fastpitch-German branch June 30, 2022 21:24
XuesongYang added a commit that referenced this pull request Jul 5, 2022
* modify get_data to support multiple speaker IDs.
* created a dataset config for HUI-ACG.
* created a model config for fastpitch German.

Signed-off-by: Xuesong Yang <[email protected]>
arendu pushed a commit that referenced this pull request Jul 21, 2022
* modify get_data to support multiple speaker IDs.
* created a dataset config for HUI-ACG.
* created a model config for fastpitch German.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: arendu <[email protected]>
Davood-M pushed a commit to Davood-M/NeMo that referenced this pull request Aug 9, 2022
* modify get_data to support multiple speaker IDs.
* created a dataset config for HUI-ACG.
* created a model config for fastpitch German.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* modify get_data to support multiple speaker IDs.
* created a dataset config for HUI-ACG.
* created a model config for fastpitch German.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants