MultiSpeaker Tacotron2 in Persian Language

This repository implements Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) for the Persian language. The core codebase is derived from this repository, which has been updated to address deprecated features and complete setup for Persian language compatibility. The original codebase, sourced from this repository, has been modified to support Persian language requirements.

Quickstart

1. Character-set definition:

Open the synthesizer/persian_utils/symbols.py file and update the _characters variable to include all the characters that exist in your text files. Most of Persian characters and symbols are already included in this variable as follows:

_characters = "ءابتثجحخدذرزسشصضطظعغفقلمنهويِپچژکگیآۀأؤإئًَُّ!(),-.:;?  ̠،…؛؟‌٪#ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_–@+/\u200c"

2. Data structures:

dataset/persian_date/
    train_data/
        speaker1/book-1/
            sample1.txt
            sample1.wav
            ...
        ...
    test_data/
        ...

3. Preprocessing:

python synthesizer_preprocess_audio.py dataset --datasets_name persian_data --subfolders train_data --no_alignments
python synthesizer_preprocess_embeds.py dataset/SV2TTS/synthesizer

4. Train synthesizer:

python synthesizer_train.py my_run dataset/SV2TTS/synthesizer

5. Inference:

For synthesizing wav file you must put all final models in saved_models/final_models directory. If you do not train speaker encoder and vocoder models you can use pretrained models in saved_models/default.

Inference using WavRNN as vocoder:

python inference.py --vocoder "WavRNN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/refrence.wav" --test_name "test1"

But WavRNN is an old vocoder and if you want to use HiFiGAN you must first download a pretrained model in English.

First, install the parallel_wavegan package. See this package for more information.

pip install parallel_wavegan

Then download pretrained HiFiGAN to your saved models:

from parallel_wavegan.utils import download_pretrained_model
download_pretrained_model("vctk_hifigan.v1", "saved_models/final_models/vocoder_HiFiGAN")

Now you can use HiFiGAN as a vocoder in inference command:

python inference.py --vocoder "HiFiGAN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/refrence.wav" --test_name "test1"

Output Samples

You can find output samples synthesized by the trained model from this study (link to be updated) in this directory along with the same utterances generated by two baseline models, the natural utterances, and utterances with gold spectrograms where the waveform is generated by the vocoder used in the study.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
encoder		encoder
results		results
saved_models/default		saved_models/default
synthesizer		synthesizer
utils		utils
vocoder		vocoder
.gitignore		.gitignore
README.md		README.md
auto_inference.py		auto_inference.py
encoder_preprocess.py		encoder_preprocess.py
encoder_train.py		encoder_train.py
inference.py		inference.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
start_instruction.txt		start_instruction.txt
synthesizer_preprocess_audio.py		synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py		synthesizer_preprocess_embeds.py
synthesizer_train.py		synthesizer_train.py
train_info.txt		train_info.txt
vocoder_preprocess.py		vocoder_preprocess.py
vocoder_train.py		vocoder_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiSpeaker Tacotron2 in Persian Language

Quickstart

Output Samples

References:

About

Releases

Packages

Languages

MahtaFetrat/Persian-MultiSpeaker-Tacotron2

Folders and files

Latest commit

History

Repository files navigation

MultiSpeaker Tacotron2 in Persian Language

Quickstart

Output Samples

References:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages