How to train multi speaker datasets? #113

tuna2134 · 2024-11-11T03:05:24Z

No description provided.

shivammehta25 · 2024-11-11T03:53:20Z

Hello,

Check the dataloading:

Matcha-TTS/matcha/data/text_mel_datamodule.py

Lines 166 to 170 in 7780426

    
           filepath, spk, text = ( 
        
               filepath_and_text[0], 
        
               int(filepath_and_text[1]), 
        
               filepath_and_text[2], 
        
           )

The dataset's file list should be arranged in rows like:

text: str|spkd_id: int|audio_location: str

So example:

Hello, I am Shivam|0|shivam_speech.wav
Hello, I am not Shivam|1|not_shivam_speech.wav
Hello, I am Batman|2|batman_speech.wav
I am not scared of bats now|2|batman_speech2.wav

Update the fielist here:

Matcha-TTS/configs/data/vctk.yaml

Lines 7 to 8 in 7780426

    
           train_filelist_path: data/filelists/vctk_audio_sid_text_train_filelist.txt 
        
           valid_filelist_path: data/filelists/vctk_audio_sid_text_val_filelist.txt

set the number of speakers accordingly like in the example it is 3 speakers: [0, 1, 2]

Matcha-TTS/configs/data/vctk.yaml

Line 11 in 7780426

n_spks: 109

Then follow the steps in readme to Train with your own dataset. Hope this helps :)

tuna2134 · 2024-11-11T04:14:12Z

Hello,

Check the dataloading:

Matcha-TTS/matcha/data/text_mel_datamodule.py

Lines 166 to 170 in 7780426

filepath, spk, text = (

filepath_and_text[0],

int(filepath_and_text[1]),

filepath_and_text[2],

)

The dataset's file list should be arranged in rows like:
text: str|spkd_id: int|audio_location: str 
So example:
Hello, I am Shivam|0|shivam_speech.wav
Hello, I am not Shivam|1|not_shivam_speech.wav
Hello, I am Batman|2|batman_speech.wav
I am not scared of bats now|2|batman_speech2.wav
Update the fielist here:

Matcha-TTS/configs/data/vctk.yaml

Lines 7 to 8 in 7780426

train_filelist_path: data/filelists/vctk_audio_sid_text_train_filelist.txt

valid_filelist_path: data/filelists/vctk_audio_sid_text_val_filelist.txt

set the number of speakers accordingly like in the example it is 3 speakers: [0, 1, 2]

Matcha-TTS/configs/data/vctk.yaml

Line 11 in 7780426

n_spks: 109

Then follow the steps in readme to Train with your own dataset. Hope this helps :)

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train multi speaker datasets? #113

How to train multi speaker datasets? #113

tuna2134 commented Nov 11, 2024

shivammehta25 commented Nov 11, 2024

tuna2134 commented Nov 11, 2024

How to train multi speaker datasets? #113

How to train multi speaker datasets? #113

Comments

tuna2134 commented Nov 11, 2024

shivammehta25 commented Nov 11, 2024

tuna2134 commented Nov 11, 2024