Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One click installer + usage question #2

Open
rsxdalv opened this issue Jun 11, 2023 · 5 comments
Open

One click installer + usage question #2

rsxdalv opened this issue Jun 11, 2023 · 5 comments

Comments

@rsxdalv
Copy link

rsxdalv commented Jun 11, 2023

Hi, thank you for the great project you have made available!

I added it to my one click installed package of AI based audio generators. Link

Here's the notebook I quickly created:
https://github.com/rsxdalv/tts-generation-webui/blob/main/notebooks/vocos.ipynb

I wonder if using this in a pipeline with SunoAI/Bark has a different impact than with something else. I couldn't manage to link up the raw encodec codes so I used the final wav files.
I saw the best result when using 12kbps bandwidth although if I remember correctly Bark model runs on 6kbps.
In my small sample size I didn't see a unsupervised improvement although I found an example where it gives more "quality" to a sound sample (I included it next to the notebook).

I would love to see how would it go if I could link it up with the encodec tokens from Bark and how to best go about using it.

@hubertsiuzdak
Copy link
Collaborator

Hey, I've updated the Vocos API (#4) to make it easier to integrate with Bark. Take a look at the example notebook.

Hope it helps!

@rsxdalv
Copy link
Author

rsxdalv commented Jun 12, 2023

Thank you! For now this is the initial UI, but it will grow from here.
rsxdalv/tts-generation-webui#35
localhost_7860_ (2)

@gitihobo
Copy link

hey rsxdalv Could you make a training section that lets us train our own vocos model at a higher sample rate?

@rsxdalv
Copy link
Author

rsxdalv commented Jun 25, 2023

It's possible, do you have a sample of the command/dataset/config?

@gitihobo
Copy link

Dataset I am imagining multiple 10 second audio files config I was making for 48k is

pytorch_lightning==1.8.6

seed_everything: 4444

data:
class_path: vocos.dataset.VocosDataModule
init_args:
train_params:
filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.train
sampling_rate: 48000
num_samples: 16384
batch_size: 16
num_workers: 8

val_params:
  filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.val
  sampling_rate: 48000
  num_samples: 48384
  batch_size: 16
  num_workers: 8

model:
class_path: vocos.experiment.VocosExp
init_args:
sample_rate: 48000
initial_learning_rate: 2e-4
mel_loss_coeff: 45
mrd_loss_coeff: 0.1
num_warmup_steps: 0 # Optimizers warmup steps
pretrain_mel_steps: 0 # 0 means GAN objective from the first iteration

# automatic evaluation
evaluate_utmos: true
evaluate_pesq: true
evaluate_periodicty: true

feature_extractor:
  class_path: vocos.feature_extractors.MelSpectrogramFeatures
  init_args:
    sample_rate: 48000
    n_fft: 1024
    hop_length: 256
    n_mels: 100
    padding: center

backbone:
  class_path: vocos.models.VocosBackbone
  init_args:
    input_channels: 100
    dim: 512
    intermediate_dim: 1536
    num_layers: 8

head:
  class_path: vocos.heads.ISTFTHead
  init_args:
    dim: 512
    n_fft: 1024
    hop_length: 256
    padding: center

trainer:
logger:
class_path: pytorch_lightning.loggers.TensorBoardLogger
init_args:
save_dir: logs/
callbacks:
- class_path: pytorch_lightning.callbacks.LearningRateMonitor
- class_path: pytorch_lightning.callbacks.ModelSummary
init_args:
max_depth: 2
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
filename: vocos_checkpoint_{epoch}{step}{val_loss:.4f}
save_top_k: 3
save_last: true
- class_path: vocos.helpers.GradNormCallback

Lightning calculates max_steps across all optimizer steps (rather than number of batches)

This equals to 1M steps per generator and 1M per discriminator

max_steps: 2000000

You might want to limit val batches when evaluating all the metrics, as they are time-consuming

limit_val_batches: 100
accelerator: gpu
strategy: ddp
devices: [0]
log_every_n_steps: 100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants