-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Speaker Encoder
Eren Gölge edited this page Nov 14, 2019
·
1 revision
TTS has a subproject, called Speaker Encoder. It is an implementation of https://arxiv.org/abs/1710.10467 . There is also a released model trained on LibriTTS dataset with ~1000 speakers in Released Models page.
You can use this model for various purposes:
- Training a multi-speaker model using voice embeddings as speaker features.
- Compute embedding vectors by
compute_embedding.py
and feed them to your TTS network. (TTS side needs to be implemented but it should be straight forward)
- Compute embedding vectors by
- Pruning bad examples from your TTS dataset.
- Compute embedding vectors and plot them using the notebook provided. Thx @nmstoker for this!
- Use as a speaker classification or verification system.
- Speaker diarization for ASR systems.
The model provided here is the halve of the baseline model. I figured, it is easier to train and the final performance does not differ too much compared to the larger version.