You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@0nutation in the paper, you have mentioned that you have trained a multi-speaker vocoder. could you please share the checkpoint?
Unit Vocoder Due to limition of single speaker unit vocoder in (Polyak et al., 2021), we train a
multi-speaker unit HiFi-GAN to decode the speech signal from the discrete representation. The
HiFi-GAN architecture consists of a generator G and multiple discriminators D. The generator uses
look-up tables (LUT) to embed discrete representations and the embedding sequences are up-sampled
by a series of blocks composed of transposed convolution and a residual block with dilated layers.
The speaker embedding is concatenated to each frame in the up-sampled sequence. The discriminator
features a Multi-Period Discriminator (MPD) and a Multi-Scale Discriminator (MSD), which have
the same architecture as (Polyak et al., 2021).
hello,
the provided vocoder checkpoint using mHubert does not support multi-speaker. Do you have a multi-speaker checkpoint?
mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj
The text was updated successfully, but these errors were encountered: