Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune Only? #2

Open
colstone opened this issue Jun 27, 2024 · 3 comments
Open

Finetune Only? #2

colstone opened this issue Jun 27, 2024 · 3 comments

Comments

@colstone
Copy link

hi,我在尝试训练freev的时候发现了一个问题,当我尝试不使用预训练模型来训练一个采样率为44.1khz的模型时,代码打印完g的结构之后不进行训练。请问目前代码是只能进行微调吗?如果可以的话,我该修改哪些部分以便开始训练而不是微调?
config文件如下:

{
    "input_training_wav_list": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/train",
    "input_validation_wav_list": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/val",
    "test_input_wavs_dir":"/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/test",
    "test_input_mels_dir":"./",
    "test_mel_load": 0,
    "test_output_dir": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/test_out",

    "batch_size": 16,
    "learning_rate": 0.0002,
    "adam_b1": 0.8,
    "adam_b2": 0.99,
    "lr_decay": 0.999,
    "seed": 114514,
    "training_epochs": -1,
    "stdout_interval":20,
    "checkpoint_interval": 1000,
    "summary_interval": 100,
    "validation_interval": 1000,
    "checkpoint_path": "./ckpt/20240627-freev-44100",
    "checkpoint_file_load": "",

    "ASP_channel": 513,
    "ASP_resblock_kernel_sizes": [3,7,11],
    "ASP_resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "ASP_input_conv_kernel_size": 7,
    "ASP_output_conv_kernel_size": 7,

    "PSP_channel": 512,
    "PSP_resblock_kernel_sizes": [3,7,11],
    "PSP_resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]], 
    "PSP_input_conv_kernel_size": 7,
    "PSP_output_R_conv_kernel_size": 7,
    "PSP_output_I_conv_kernel_size": 7,

    "segment_size": 16384,
    "num_mels": 128,
    "n_fft": 2048,
    "hop_size": 512,
    "win_size": 2048,

    "sampling_rate": 44100,

    "fmin": 40,
    "fmax": 16000,
    "meloss":null,
    "num_workers": 4
}

json文件肯定有一些地方是错误的,还望海涵

@BakerBunker
Copy link
Owner

应该是training epochs这里,如果是-1的话会立刻结束循环

@BakerBunker
Copy link
Owner

如果训练有结果的话,可以了解一下训练结果吗😂我也挺好奇这个方法在歌声上会不会有比speech更大的提升,个人感觉如果没有更改f0的需求的话,伪逆幅度谱的condition比f0是更强的

@colstone
Copy link
Author

如果训练有结果的话,可以了解一下训练结果吗😂我也挺好奇这个方法在歌声上会不会有比speech更大的提升,个人感觉如果没有更改f0的需求的话,伪逆幅度谱的condition比f0是更强的

好的,不过要是应用到目前的歌声合成的话,确实还得需要f0_emb。后续练完我把权重公开一下))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants