NS2VC_v2

Unofficial implementation of NaturalSpeech2 for Voice Conversion

Different from the NS2, I use the vocos but encodec as the vocoder for better quality, and use contentvec to substitute the text embedding and duration span process. I also adopted the unet1d conditional model from the diffusers lib, thanks for their hard works.

About Zero shot generalization

I did many attempt on improve the generalization of the model. And I find that it's much like the stable diffusion. If a tag is not in your train set, you can't get a promising result. Larger dataset, more speaker, better generalization, better results. The model can ensure speakers in trainset have a good result.

Demo

refer	input	output
refer0.webm	gt0.webm	gen0.webm
refer1.webm	gt1.webm	gen1.webm
refer2.webm	gt2.webm	gen2.webm
refer3.webm	gt3.webm	gen3.webm
refer4.webm	gt4.webm	gen4.webm

Data preprocessing

First of all, you need to download the contentvec model and put it under the hubert folder. The model can be download from here.

The dataset structure can be like this:

dataset
├── spk1
│   ├── 1.wav
│   ├── 2.wav
│   ├── ...
│   └── spk11
│       ├── 11.wav
├── 3.wav
├── 4.wav

Overall, you can put the data in any way you like.

Put the data with .wav extension under the dataset folder, and then run the following command to preprocess the data.

python preprocess.py

The preprocessed data will be saved under the processed_dataset folder.

Requirements

You can install the requirements by running the following command.

pip install vocos accelerate matplotlib librosa unidecode inflect ema_pytorch tensorboard fairseq praat-parselmouth pyworld

Training

Install the accelerate first, run accelerate config to configure the environment, and then run the following command to train the model.

accelerate launch train.py

Inference

Change the device, model_path, clean_names and refer_names in the inference.py, and then run the following command to inference the model.

python infer.py

Continue training

If you want to fine tune or continue to train a model. Add

trainer.load('your_model_path')

to the train.py.

Pretrained model

Maybe comming soon, if I had enough data for a good model.

TTS

If you want to use the TTS model, please check the TTS branch.

Q&A

qq group:801645314 You can add the qq group to discuss the project.

Thanks to sovits4, naturalspeech2 and imagen diffusersfor their great works.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
dataset		dataset
hubert		hubert
inference		inference
logs		logs
modules		modules
nsf_hifigan		nsf_hifigan
output		output
raw		raw
sampler		sampler
unet1d		unet1d
.gitignore		.gitignore
README.md		README.md
config.json		config.json
dataset.py		dataset.py
demo.ipynb		demo.ipynb
infer.py		infer.py
model.py		model.py
operations.py		operations.py
parametrizations.py		parametrizations.py
parametrize.py		parametrize.py
preprocess.py		preprocess.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NS2VC_v2

Unofficial implementation of NaturalSpeech2 for Voice Conversion

About Zero shot generalization

Demo

Data preprocessing

Requirements

Training

Inference

Continue training

Pretrained model

TTS

Q&A

About

Releases

Packages

Contributors 3

Languages

adelacvg/NS2VC

Folders and files

Latest commit

History

Repository files navigation

NS2VC_v2

Unofficial implementation of NaturalSpeech2 for Voice Conversion

About Zero shot generalization

Demo

Data preprocessing

Requirements

Training

Inference

Continue training

Pretrained model

TTS

Q&A

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages