-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS]Diffsinger opencpop baseline #2834
Conversation
9888edc
to
4d5a6b4
Compare
@@ -0,0 +1,12 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个其实可以直接软连接 csmsc/tts3/local/train.sh (除了 ngpu 不一样),可能等后期再改
return outs[0], d_outs[0], p_outs[0], e_outs[0] | ||
|
||
|
||
class FastSpeech2MIDILoss(nn.Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以继承自 FastSpeech2Loss, 直接使用父类的 __init__
self.fs2 = FastSpeech2MIDI( | ||
idim=idim, | ||
odim=odim, | ||
fastspeech2_config=fastspeech2_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一个是 fastspeech2_config 一个是 fastspeech2_params 此处是否保持一致
optimizers: Dict[str, Optimizer], | ||
criterions: Dict[str, Layer], | ||
dataloader: DataLoader, | ||
fs2_train_start_steps: int=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个参数感觉不需要,fs2_train_start_steps 存在不是 0 的情况吗?
spk_id = paddle.cast(spk_id, 'int64') | ||
# forward propagation | ||
before_outs, after_outs, d_outs, p_outs, e_outs, spk_logits = self._forward( | ||
xs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议调用时加上形参,参数太多了,容易乱
es = e.unsqueeze(0) if e is not None else None | ||
|
||
# (1, L, odim) | ||
_, outs, d_outs, p_outs, e_outs, _ = self._forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
调用时建议加上形参
is_inference=True) | ||
else: | ||
# (1, L, odim) | ||
_, outs, d_outs, p_outs, e_outs, _ = self._forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
调用时建议加上形参
|
||
# (1, L, odim) | ||
# use *_ to avoid bug in dygraph to static graph | ||
hs, h_masks = self._forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
调用时建议加上形参
|
||
# (1, L, odim) | ||
# use *_ to avoid bug in dygraph to static graph | ||
hs, _ = self._forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
调用时建议加上形参
|
||
report("train/loss_ds", float(loss_ds)) | ||
report("train/l1_loss_ds", float(l1_loss_ds)) | ||
losses_dict["l1_loss_ds"] = float(l1_loss_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个 loss 一样的话是否需要 report 两遍?
self.normalizer = normalizer | ||
self.acoustic_model = model | ||
|
||
def forward(self, text, note, note_dur, is_slur, get_mel_fs2: bool=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehint
from paddlespeech.t2s.training.trainer import Trainer | ||
from paddlespeech.t2s.utils import str2bool | ||
|
||
# from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以删掉
dataset (str): dataset name | ||
Returns: | ||
Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str) | ||
tunple: speaker name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tunple 拼写错误
print("========Config========") | ||
print(config) | ||
print( | ||
f"master see the word size: {dist.get_world_size()}, from pid: {os.getpid()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
world
mel_fs2 = mel_fs2.unsqueeze(0).transpose((0, 2, 1)) | ||
cond_fs2 = self.fs2.encoder_infer(text, note, note_dur, is_slur) | ||
cond_fs2 = cond_fs2.transpose((0, 2, 1)) | ||
mel, _ = self.diffusion(mel_fs2, cond_fs2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处是否应该调用 self.diffusion.inference() , 如果是的话 应该加一个 num_inference_steps 参数控制下步数,默认的 1000 太大了
This pull request is now in conflict :( |
PR types
new feature
PR changes
add Diffsinger opencpop baseline (fft training)
Describe
add Diffsinger opencpop baseline (fft training)
fix #2821