-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如何训练标贝数据集? #4
Comments
Please allow me answer your question in English to make everybody understand it. My mel spectrograms range from [-4, 4] which is compatible with the hyper parameters of Rayhane Mamah's Tacotron-2. Therefore I have to set mel_pad_val hyper parameters in my repo. Maybe you need to do some linear transform on mel output to make WaveGlow vocoder work good like (mel + 5) / 2 * 5. Of course you might use Besides, I have provided G&L method to convert mel outputs into audio as evaluation. Why not run |
Sorry that you might expand worker number of data loader for training acceleration 35e7f45 |
My fault #5 |
@begeekmyfriend 多谢您的回复。其实我用的是NVIDIA的code 0970653 训练的tacotron2模型,基于标贝数据集,只将text-cleaners 改为了 basic_cleaner, batchsize设置为64, 其余的都是train_tacotron2.sh中默认的参数。目前训练到1900步,损失已经平稳,并用NVIDIA提供的预训练的waveglow作为声码器,得到的音频质量不是特别好(音频内容为:长城是古代中国)。接下来不知道该怎么调整了,想听听您的建议。不知道中文和英文训练相比,有没有什么需要特别注意的地方。再次打扰,谢谢。 |
@begeekmyfriend 谢谢。请问您的版本大概需要多少个epoch趋于平稳呢,因为留给我的时间不多了哈哈?另外您用的是什么声码器呢?或者有没有与您的版本对应的声码器呢? |
@begeekmyfriend 非常感谢您的耐心回复。只要200个epoch就可用了吗?我看其他的预训练模型动辄几十几百k啊,难道是指总的iter数吗?另外,我看您的wavernn中关于stft的参数和当前版本的tacotron2的不一致,没影响吗? |
每个人的语料数目不一样,有的epoch高达上千条样本(比如多人),所以你可以自己算一下,在 |
@begeekmyfriend 非常非常感谢。想弱弱问个科普性问题。我在语音生成code中经常看到GTA这个词,但不理解它的意思,是指用训练好的tacotron2生成mel谱,并将其丢给wavernn训练?为什么这么做呢?主要是没理解怎么给您的版本的wavernn准备数据,求赐教。再次感谢。 |
Ground truth aligned means evaluation from the training data. The final results of the vocoder are inferred from what we feed with the T2 evaluated mel spectrograms as the inputs. In my experience, the structure of data directory in wavernn
└── data
└── voc_mol
├── gta/*.npy
└── quant/*.npy And you might type such command line to start training: python train_wavernn.py --gta As for the |
By the way, I would like to provide some length matching program to ensure the alignment between wav samples and mel hops. import os
import numpy as np
hop = 256
mins = []
maxs = []
basedir = 'gta'
for f in os.listdir(basedir):
gta = os.path.join(basedir, f)
quant = os.path.join('quant', f)
mel = np.load(gta)
gta_len = mel.shape[1]
wav_len = np.load(quant).shape[0]
assert(gta_len * hop == wav_len)
mins.append(mel.min())
maxs.append(mel.max())
print(sorted(mins)[0], sorted(maxs)[-1]) |
@begeekmyfriend 多谢不吝赐教。明白了,就是gta下放置用训练好的tacotron2中的gta生成的mel 的.npy文件,quant下放原始音频的.npy文件哈。 等我训练好tacotron2后试一下。再次感谢。 |
@begeekmyfriend 您好,在您的指教下,我的语音合成进展很大,再次感谢。wavernn还在继续训练中,不过loss很快就降到了2.7左右后就停滞了,效果还行,就是有点不稳定。 |
T2有dropout,请先用G&L确认一下。另外,如果
|
@begeekmyfriend 谢谢,我的wavernn已经训练到640k了,我已经分别用了GL和wavernn合成音频。 |
No, unless you set dropout rate as zero. |
@begeekmyfriend OK. 我看网上都这么说,哈哈,我刚大概看了下源码,我觉得是代码的问题。估计用nn.Module.Dropout 而不是用functional.dropout() 估计就没这个问题了。明天我测试下。 |
这两者相互之间是wrapper关系吧,接口版本不同,没本质区别。 |
@begeekmyfriend 不好意思,是我看错了代码中调用functional.dropout()时传的参数。但是如果在推断过程中,执行了model.eval(), 模型就不会dropout了,这个是正确的。之所以每次输出不一致,是因为没设置随机种子,因为在推断时prenet中有个生成伯努利分布导致的。 |
The |
我看到了model.eval() 被调用了,所以当时才好奇为什么每次输出不一样,我设置随机种子后,现在一致了。 |
Your PR would be appreciated. |
@begeekmyfriend Thank you very much for your help. |
@xinzheshen 您好,我想请教您几个问题。
|
@begeekmyfriend 您好,请问您用这个模型训练过标贝数据集吗?另外除了text-cleaners外,中文与英文的训练参数还有什么明显区别吗?
我用原英伟达的repo训练标贝数据集,设置 text-cleaners 为transliteration_cleaners,batch_size = 32, 其他的保持其默认参数,其中epochs=1501, 但我训练完后,将tacotron2的输出mel谱喂给英伟达的预训练的waveglow,但合成效果很差。我试过将从中文语音计算得到的mel谱 喂给waveglow是正常的,说明这个声码器是没问题的。
不知道是训练epochs不够,还是某些参数不合适,想像您请教一下,多谢多谢。
The text was updated successfully, but these errors were encountered: