Skip to content

Latest commit

 

History

History
96 lines (66 loc) · 2.82 KB

README.md

File metadata and controls

96 lines (66 loc) · 2.82 KB

Indonesian TTS using Coqui TTS

Models are available in Releases tab.

DO NOT USE FOR COMMERCIAL PURPOSES!

Model changelog

v1.2 (Aug 12, 2022)

Finetuned from v1.1 model on:

  • 4 hours of Audiobook dataset
  • 2000 sample of Azure TTS
  • High quality TTS data for Javanese & Sundanese

v1.1 (Aug 6, 2022)

Finetuned from LJSpeech model on:

  • 4 hours of Audiobook dataset
  • 2000 sample of Azure TTS

v1.0 (Jun 23, 2022)

Trained from scratch on:

  • 4 hours of Audiobook dataset.

Example

Ardi (Azure):

ardi-azure.mp4

Gadis (Azure):

gadis-azure.mp4

Wibowo (Audiobook):

wibowo-audiobook.mp4

How to use

You need g2p-id to convert grapheme to phoneme.

Use tts command from Coqui TTS to synthesize speech:

tts --text "saja səˈdanʔ ˈbərada di dʒaˈkarta." \
    --model_path checkpoint.pth \
    --config_path config.json \
    --speaker_idx wibowo \
    --out_path output.wav

You can get all speaker idx by using --list_speaker_idxs:

tts --model_path checkpoint.pth \
    --config_path config.json \
    --list_speaker_idxs

Data

Citations

@misc{https://doi.org/10.48550/arxiv.2106.06103,
  doi = {10.48550/ARXIV.2106.06103}, 
  url = {https://arxiv.org/abs/2106.06103},
  author = {Kim, Jaehyeon and Kong, Jungil and Son, Juhee},
  keywords = {Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
  title = {Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech},
  publisher = {arXiv},
  year = {2021},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
@inproceedings{kjartansson-etal-tts-sltu2018,
    title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
    author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
    booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
    year  = {2018},
    address = {Gurugram, India},
    month = aug,
    pages = {66--70},
    URL   = {http://dx.doi.org/10.21437/SLTU.2018-14}
}