diff --git a/docs/source/tts/models.rst b/docs/source/tts/models.rst index 9163418d1339..05cced9c4050 100644 --- a/docs/source/tts/models.rst +++ b/docs/source/tts/models.rst @@ -51,6 +51,11 @@ Tacotron 2 consists of a recurrent sequence-to-sequence feature prediction netwo :scale: 30% +SSL FastPitch +~~~~~~~~~~~~~ +This **experimental** version of FastPitch takes in content and speaker embeddings generated by an SSL Disentangler and generates mel-spectrograms, with the goal that voice characteristics are taken from the speaker embedding while the content of speech is determined by the content embedding. Voice conversion can be done using this model by swapping the speaker embedding input to that of a target speaker, while keeping the content embedding the same. More details to come. + + Vocoders -------- @@ -110,4 +115,4 @@ References .. bibliography:: tts_all.bib :style: plain :labelprefix: TTS-MODELS - :keyprefix: tts-models- \ No newline at end of file + :keyprefix: tts-models-