One click installer + usage question #2

rsxdalv · 2023-06-11T21:15:31Z

Hi, thank you for the great project you have made available!

I added it to my one click installed package of AI based audio generators. Link

Here's the notebook I quickly created:
https://github.com/rsxdalv/tts-generation-webui/blob/main/notebooks/vocos.ipynb

I wonder if using this in a pipeline with SunoAI/Bark has a different impact than with something else. I couldn't manage to link up the raw encodec codes so I used the final wav files.
I saw the best result when using 12kbps bandwidth although if I remember correctly Bark model runs on 6kbps.
In my small sample size I didn't see a unsupervised improvement although I found an example where it gives more "quality" to a sound sample (I included it next to the notebook).

I would love to see how would it go if I could link it up with the encodec tokens from Bark and how to best go about using it.

hubertsiuzdak · 2023-06-12T17:10:14Z

Hey, I've updated the Vocos API (#4) to make it easier to integrate with Bark. Take a look at the example notebook.

Hope it helps!

rsxdalv · 2023-06-12T17:46:36Z

Thank you! For now this is the initial UI, but it will grow from here.
rsxdalv/tts-generation-webui#35

gitihobo · 2023-06-25T00:32:14Z

hey rsxdalv Could you make a training section that lets us train our own vocos model at a higher sample rate?

rsxdalv · 2023-06-25T09:32:38Z

It's possible, do you have a sample of the command/dataset/config?

gitihobo · 2023-06-25T17:54:11Z

Dataset I am imagining multiple 10 second audio files config I was making for 48k is

pytorch_lightning==1.8.6

seed_everything: 4444

data:
class_path: vocos.dataset.VocosDataModule
init_args:
train_params:
filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.train
sampling_rate: 48000
num_samples: 16384
batch_size: 16
num_workers: 8

val_params:
  filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.val
  sampling_rate: 48000
  num_samples: 48384
  batch_size: 16
  num_workers: 8

model:
class_path: vocos.experiment.VocosExp
init_args:
sample_rate: 48000
initial_learning_rate: 2e-4
mel_loss_coeff: 45
mrd_loss_coeff: 0.1
num_warmup_steps: 0 # Optimizers warmup steps
pretrain_mel_steps: 0 # 0 means GAN objective from the first iteration

# automatic evaluation
evaluate_utmos: true
evaluate_pesq: true
evaluate_periodicty: true

feature_extractor:
  class_path: vocos.feature_extractors.MelSpectrogramFeatures
  init_args:
    sample_rate: 48000
    n_fft: 1024
    hop_length: 256
    n_mels: 100
    padding: center

backbone:
  class_path: vocos.models.VocosBackbone
  init_args:
    input_channels: 100
    dim: 512
    intermediate_dim: 1536
    num_layers: 8

head:
  class_path: vocos.heads.ISTFTHead
  init_args:
    dim: 512
    n_fft: 1024
    hop_length: 256
    padding: center

trainer:
logger:
class_path: pytorch_lightning.loggers.TensorBoardLogger
init_args:
save_dir: logs/
callbacks:
- class_path: pytorch_lightning.callbacks.LearningRateMonitor
- class_path: pytorch_lightning.callbacks.ModelSummary
init_args:
max_depth: 2
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
filename: vocos_checkpoint_{epoch}{step}{val_loss:.4f}
save_top_k: 3
save_last: true
- class_path: vocos.helpers.GradNormCallback

Lightning calculates max_steps across all optimizer steps (rather than number of batches)

This equals to 1M steps per generator and 1M per discriminator

max_steps: 2000000

You might want to limit val batches when evaluating all the metrics, as they are time-consuming

limit_val_batches: 100
accelerator: gpu
strategy: ddp
devices: [0]
log_every_n_steps: 100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One click installer + usage question #2

One click installer + usage question #2

rsxdalv commented Jun 11, 2023

hubertsiuzdak commented Jun 12, 2023

rsxdalv commented Jun 12, 2023

gitihobo commented Jun 25, 2023

rsxdalv commented Jun 25, 2023

gitihobo commented Jun 25, 2023

One click installer + usage question #2

One click installer + usage question #2

Comments

rsxdalv commented Jun 11, 2023

hubertsiuzdak commented Jun 12, 2023

rsxdalv commented Jun 12, 2023

gitihobo commented Jun 25, 2023

rsxdalv commented Jun 25, 2023

gitihobo commented Jun 25, 2023

pytorch_lightning==1.8.6

Lightning calculates max_steps across all optimizer steps (rather than number of batches)

This equals to 1M steps per generator and 1M per discriminator

You might want to limit val batches when evaluating all the metrics, as they are time-consuming