You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to integrate this model to tacotron 1 for TTS but since I do not have corresponding exact audios that are to be generated from the tacotron's output mel spectrogram, how should I go about training the model. When I try to pass the speaker audio from data and its corresponding generated mel-spectrogram from tacotron, I get the error due to assertion: assert in_length == (kernel_length * hop_size). Please let me know if I am missing out something. Or otherwise what should be the strategy to train model to learn to decipher model generated spectrograms.
The text was updated successfully, but these errors were encountered:
The LJSpeech checkpoint for neural vocoding of tacotron2 output and the corresponding script has been provided. Please refer to https://github.com/Rongjiehuang/FastDiff/#using-tacotron. If you want to train FastDiff(Tacotron) by yourself, use this config: modules/FastDiff/config/FastDiff_tacotron.yaml
I am using it in a different configuration and facing issues. Can you please let due to this line assert in_length == (kernel_length * hop_size). I have an entirely different configuration. Where should I look into. As the output spectrogram is produced which might or might not be the size of kernel_length*hope_size where kernel length is derived from the input audio which is not directly related to shape of generated spectrogram while training. Please help.
I am trying to integrate this model to tacotron 1 for TTS but since I do not have corresponding exact audios that are to be generated from the tacotron's output mel spectrogram, how should I go about training the model. When I try to pass the speaker audio from data and its corresponding generated mel-spectrogram from tacotron, I get the error due to assertion: assert in_length == (kernel_length * hop_size). Please let me know if I am missing out something. Or otherwise what should be the strategy to train model to learn to decipher model generated spectrograms.
The text was updated successfully, but these errors were encountered: