Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS]Diffsinger opencpop baseline #2834

Closed
wants to merge 38 commits into from

Conversation

lym0302
Copy link
Contributor

@lym0302 lym0302 commented Jan 16, 2023

PR types
new feature

PR changes
add Diffsinger opencpop baseline (fft training)

Describe
add Diffsinger opencpop baseline (fft training)

fix #2821

lym0302 and others added 30 commits August 26, 2022 06:58
@lym0302 lym0302 requested a review from yt605155624 January 16, 2023 02:43
@lym0302 lym0302 marked this pull request as draft January 16, 2023 02:44
@yt605155624 yt605155624 added this to the r1.4.0 milestone Jan 19, 2023
@yt605155624 yt605155624 changed the title Diffsinger opencpop baseline [TTS]Diffsinger opencpop baseline Jan 19, 2023
@lym0302 lym0302 force-pushed the diffsinger branch 2 times, most recently from 9888edc to 4d5a6b4 Compare February 1, 2023 08:30
@yt605155624 yt605155624 marked this pull request as ready for review February 1, 2023 09:25
examples/opencpop/svs1/run.sh Outdated Show resolved Hide resolved
paddlespeech/t2s/datasets/get_feats.py Show resolved Hide resolved
paddlespeech/t2s/datasets/get_feats.py Show resolved Hide resolved
paddlespeech/t2s/datasets/preprocess_utils.py Outdated Show resolved Hide resolved
paddlespeech/t2s/datasets/preprocess_utils.py Outdated Show resolved Hide resolved
paddlespeech/t2s/exps/diffsinger/preprocess.py Outdated Show resolved Hide resolved
paddlespeech/t2s/exps/syn_utils.py Outdated Show resolved Hide resolved
paddlespeech/t2s/models/diffsinger/diffsinger.py Outdated Show resolved Hide resolved
paddlespeech/t2s/models/diffsinger/diffsinger.py Outdated Show resolved Hide resolved
paddlespeech/t2s/modules/transformer/encoder.py Outdated Show resolved Hide resolved
@@ -0,0 +1,12 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个其实可以直接软连接 csmsc/tts3/local/train.sh (除了 ngpu 不一样),可能等后期再改

return outs[0], d_outs[0], p_outs[0], e_outs[0]


class FastSpeech2MIDILoss(nn.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以继承自 FastSpeech2Loss, 直接使用父类的 __init__

self.fs2 = FastSpeech2MIDI(
idim=idim,
odim=odim,
fastspeech2_config=fastspeech2_params,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个是 fastspeech2_config 一个是 fastspeech2_params 此处是否保持一致

optimizers: Dict[str, Optimizer],
criterions: Dict[str, Layer],
dataloader: DataLoader,
fs2_train_start_steps: int=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数感觉不需要,fs2_train_start_steps 存在不是 0 的情况吗?

spk_id = paddle.cast(spk_id, 'int64')
# forward propagation
before_outs, after_outs, d_outs, p_outs, e_outs, spk_logits = self._forward(
xs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议调用时加上形参,参数太多了,容易乱

es = e.unsqueeze(0) if e is not None else None

# (1, L, odim)
_, outs, d_outs, p_outs, e_outs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参

is_inference=True)
else:
# (1, L, odim)
_, outs, d_outs, p_outs, e_outs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


# (1, L, odim)
# use *_ to avoid bug in dygraph to static graph
hs, h_masks = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


# (1, L, odim)
# use *_ to avoid bug in dygraph to static graph
hs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


report("train/loss_ds", float(loss_ds))
report("train/l1_loss_ds", float(l1_loss_ds))
losses_dict["l1_loss_ds"] = float(l1_loss_ds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个 loss 一样的话是否需要 report 两遍?

self.normalizer = normalizer
self.acoustic_model = model

def forward(self, text, note, note_dur, is_slur, get_mel_fs2: bool=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehint

from paddlespeech.t2s.training.trainer import Trainer
from paddlespeech.t2s.utils import str2bool

# from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Loss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以删掉

dataset (str): dataset name
Returns:
Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str)
tunple: speaker name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tunple 拼写错误

print("========Config========")
print(config)
print(
f"master see the word size: {dist.get_world_size()}, from pid: {os.getpid()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

world

@yt605155624 yt605155624 mentioned this pull request Feb 9, 2023
mel_fs2 = mel_fs2.unsqueeze(0).transpose((0, 2, 1))
cond_fs2 = self.fs2.encoder_infer(text, note, note_dur, is_slur)
cond_fs2 = cond_fs2.transpose((0, 2, 1))
mel, _ = self.diffusion(mel_fs2, cond_fs2)
Copy link
Collaborator

@yt605155624 yt605155624 Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处是否应该调用 self.diffusion.inference() , 如果是的话 应该加一个 num_inference_steps 参数控制下步数,默认的 1000 太大了

@mergify
Copy link

mergify bot commented Feb 16, 2023

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Feb 16, 2023
@mergify mergify bot removed the conflicts label Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[TTS] DiffSinger
2 participants