Montreal Forced Alignment (MFA) Version Inquiry #39

zeynabyousefi · 2024-11-02T06:10:13Z

Hello, I would like to know the exact version of Montreal Forced Alignment (MFA) used in this project. I need to confirm the version to ensure compatibility with other project components.

@ytyeung
@wenyong-h
@ivanvovk
@huawei-noah-admin

li1jkdaw · 2024-11-04T14:52:39Z

Hi, @zeynabyousefi ! We used MFA v1.0. As for the English model, meta.yaml file states that it was version v0.9.0, architecture gmm+hmm, feats mfcc+deltas.

zeynabyousefi · 2024-11-09T08:29:10Z

Thanks .
I am training the Encoder Diff VC model using the LJSpeech dataset. Currently, I am facing some issues with data preprocessing and setting up input parameters. I would appreciate any guidance on the appropriate configuration for data preprocessing and input parameters.

Additionally, I've encountered errors while running the get_avg_mels.ipynb file, which seem to be due to mismatches in sample rates, audio features (such as MFCC or Mel spectrogram), or other processing parameters.

If specific settings are required for data preprocessing and input parameters, please provide detailed instructions.

Thank you in advance for your assistance!

@ytyeung
@wenyong-h
@ivanvovk
@huawei-noah-admin

li1jkdaw · 2024-11-24T14:29:38Z

@zeynabyousefi
I can think of the two reasons why you can have mistakes in data preprocessing. The first one can be related to mel features parameters and the second one - to textgrid files.

Diff-VC model operates on mel-spectrograms with the parameters consistent with the universal HiFiGAN vocoder https://github.com/jik876/hifi-gan. You can find exact parameters in inference.ipynb in the function get_mel. If you want to use the same universal HiFiGAN vocoder as we do, please extract mel features from your audio with the mentioned function get_mel. In particular, sample rate is 22050Hz and hop size is 256, and these parameters are hard-coded in the jupyter notebook get_avg_mels.ipynb, so perhaps this is the reason why may get some mistakes while running this notebook.
MFA must be run separately, and the features used to extract alignment are different from the ones described above. But if I remember correctly, you do not have to extract those features manually to run alignment with MFA - you just have to prepare your audio files in the correct format and put them to the folders with the specific structure (e.g. spk1/book1/wav1,wav2,...). Please refer to MFA documentation for more details. Perhaps you will need to resample audio to 16kHz, but I'm not sure, you'd better check it with MFA documentation as well. After you perform alignment, you'd better check obtained .TextGrid files manually - check that the phonemes and timestamps in those files are consistent with the corresponding audio files.

I'd also want to mention that training DiffVC Average Voice Encoder on LJSpeech only is not a good idea unless you want to perform one-to-any voice conversion where source voice is always LJ. The main idea behind this Encoder is that it should convert any voice into some speaker-independent "average" voice preserving linguistic content of the source speech. It is supposed to be used in any-to-any voice conversion to transform any source voice to "average" voice thus helping to perform disentanglement between content and timbre. But if you train this Encoder only on some specific voice, it won't be able to perform properly on any voice, it will only work as expected on that particular voice. So, if you want to achieve any-to-any voice conversion, you'd better train the Encoder on as many different voices as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Montreal Forced Alignment (MFA) Version Inquiry #39

Montreal Forced Alignment (MFA) Version Inquiry #39

zeynabyousefi commented Nov 2, 2024

li1jkdaw commented Nov 4, 2024

zeynabyousefi commented Nov 9, 2024

li1jkdaw commented Nov 24, 2024

Montreal Forced Alignment (MFA) Version Inquiry #39

Montreal Forced Alignment (MFA) Version Inquiry #39

Comments

zeynabyousefi commented Nov 2, 2024

li1jkdaw commented Nov 4, 2024

zeynabyousefi commented Nov 9, 2024

li1jkdaw commented Nov 24, 2024