Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33

Subarasheese · 2023-08-29T22:32:52Z

Greetings,

As seen on #10 (comment), someone successfully trained models, so I decided to try it myself.

I used the following command:

python train.py -c configs/vits2_voice_training.json -m mydataset

However, the following happens:

INFO:mydataset:{'train': {'log_interval': 867, 'eval_interval': 867, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_voice_1_filelist_v4.txt', 'validation_files': 'filelists/val_voice_1_filelist_v4.txt', 'text_cleaners': ['basic_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': False, 'n_speakers': 0, 'cleaned_text': True, 'use_mel_spec_posterior': False}, 'model': {'use_mel_posterior_encoder': False, 'use_transformer_flows': True, 'transformer_flow_type': 'pre_conv', 'use_spk_conditioned_encoder': False, 'use_noise_scaled_mas': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'max_text_len': 500, 'model_dir': './logs/mydataset'}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Using lin posterior encoder for VITS1
Using transformer flows pre_conv for VITS2
Using normal encoder for VITS1
Using noise scaled MAS for VITS2
NOT using any duration discriminator like VITS1
Loading train data:   0%|                                                                                                                                 | 0/4 [00:00<?, ?it/s]
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f0f9a049240>
Traceback (most recent call last):
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1466, in __del__
    self._shutdown_workers()
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1397, in _shutdown_workers
    if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Traceback (most recent call last):
  File "/vits2_pytorch/train.py", line 417, in <module>
    main()
  File "/vits2_pytorch/train.py", line 54, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/vits2_pytorch/train.py", line 196, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
  File "/vits2_pytorch/train.py", line 225, in train_and_evaluate
    for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(loader):
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in __iter__
    return self._get_iterator()
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 988, in __init__
    super(_MultiProcessingDataLoaderIter, self).__init__(loader)
  File "/vits2_pytorch/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 598, in __init__
    self._sampler_iter = iter(self._index_sampler)
  File "/vits2_pytorch/data_utils.py", line 400, in __iter__
    ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero

What could be the problem here, and what could I try doing to fix this?

The text was updated successfully, but these errors were encountered:

p0p4k · 2023-08-30T01:53:56Z

Is this your private dataset? I suspect the lengths of the text might be less than minimum value or something like that. You can just copy the dataloader part in a .ipynb file and try to debug your data loading part . Load the hps from file using function in the utils.py for the dataloader.

hildazzz · 2023-08-30T09:33:36Z

Or you can decrease the values of boundaries in DistributedBucketSampler and see what happens.

Subarasheese · 2023-08-30T20:10:06Z

Is this your private dataset? I suspect the lengths of the text might be less than minimum value or something like that. You can just copy the dataloader part in a .ipynb file and try to debug your data loading part . Load the hps from file using function in the utils.py for the dataloader.

Yes, it is a custom dataset. My dataset looks normal, and the texts are pretty long. The code is erroring out after this line, not sure why as the error logs are not clear. What do I need to inspect to check what is wrong?

Or you can decrease the values of boundaries in DistributedBucketSampler and see what happens.

Where can I do that?

By the way, this is the config file I am using for training:


{
    "train": {
      "log_interval": 867,
      "eval_interval": 867,
      "seed": 1234,
      "epochs": 20000,
      "learning_rate": 2e-4,
      "betas": [0.8, 0.99],
      "eps": 1e-9,
      "batch_size": 16,
      "fp16_run": false,
      "lr_decay": 0.999875,
      "segment_size": 8192,
      "init_lr_ratio": 1,
      "warmup_epochs": 0,
      "c_mel": 45,
      "c_kl": 1.0
    },
    "data": {
      "training_files":"filelists/train_voice_1_filelist_v4.txt",
      "validation_files":"filelists/val_voice_1_filelist_v4.txt",
      "text_cleaners":["basic_cleaners"],
      "max_wav_value": 32768.0,
      "sampling_rate": 22050,
      "filter_length": 1024,
      "hop_length": 256,
      "win_length": 1024,
      "n_mel_channels": 80,
      "mel_fmin": 0.0,
      "mel_fmax": null,
      "add_blank": false,
      "n_speakers": 0,
      "cleaned_text": true,
      "use_mel_spec_posterior": false
    },
    "model": {
      "use_mel_posterior_encoder": false,
      "use_transformer_flows": true,
      "transformer_flow_type": "pre_conv",
      "use_spk_conditioned_encoder": false,
      "use_noise_scaled_mas": true,
      "inter_channels": 192,
      "hidden_channels": 192,
      "filter_channels": 768,
      "n_heads": 2,
      "n_layers": 6,
      "kernel_size": 3,
      "p_dropout": 0.1,
      "resblock": "1",
      "resblock_kernel_sizes": [3,7,11],
      "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
      "upsample_rates": [8,8,2,2],
      "upsample_initial_channel": 512,
      "upsample_kernel_sizes": [16,16,4,4],
      "n_layers_q": 3,
      "use_spectral_norm": false
    },
    "max_text_len": 500
  }

beqabeqa473 · 2023-08-31T06:45:49Z

Hi all.

btw, can i turn off validation and test lists? i would be able to validate model myself. My dataset is not so large to throw away sentences to validations and tests

p0p4k · 2023-08-31T06:57:21Z

just pass in the same list to both train and validation. For the validation loader use torch.utils.data.Subset and pass only 4-5 samples so that you can get the evaluation while training. If you want to completely turn off evaluation, just comment out evaluate() function in train.py.

beqabeqa473 · 2023-08-31T07:10:07Z

and what about test.txt in filelists?

…

On 8/31/23, p0p ***@***.***> wrote: just pass in the same list to both train and validation. For the validation loader use `torch.utils.data.Subset` and pass only 4-5 samples so that you can get the evaluation while training. If you want to completely turn off evaluation, just comment out `evaluate()` function in `train.py`. -- Reply to this email directly or view it on GitHub: #33 (comment) You are receiving this because you commented. Message ID: ***@***.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: ***@***.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

p0p4k · 2023-08-31T07:27:52Z

Test is the evaluation in this repo.

beqabeqa473 · 2023-08-31T07:53:34Z

so validations and tests is the same?

…

On 8/31/23, p0p ***@***.***> wrote: Test is the evaluation in this repo. -- Reply to this email directly or view it on GitHub: #33 (comment) You are receiving this because you commented. Message ID: ***@***.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: ***@***.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

p0p4k · 2023-08-31T07:57:38Z

Yes, we just send 2 lists in config. Train and val

Subarasheese · 2023-08-31T12:43:57Z

No clue on what the problem might be?

What do you suggest me to do with the dataset?

p0p4k · 2023-08-31T12:46:04Z

Print out the "len_bucket" in data_utils and try to debug from there.

Subarasheese · 2023-08-31T12:54:06Z

Print out the "len_bucket" in data_utils and try to debug from there.

Those were the outputs

buckets from line 371:
0
8
34

So the first bucket has length 0, the second has length 8, and the last has length 34

Is there anything wrong about it?

p0p4k · 2023-08-31T14:40:55Z

So bucket length 0 is causing the issue. Cannot divide by 0. You can just entirely disable this function the dataloader temporarily.

Subarasheese · 2023-09-01T02:13:40Z

@p0p4k I managed to get the training working by making a few changes, including:

1 - skipping the "0" bucket in that for-loop to avoid the exception
2 - Editing the symbols file (non-English language)
3 - The mel_processing file had a bug, I needed to replace the mel attribution to this:
mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)

And now it started training, but the bucket issue sure is strange and I believe it was not supposed to happen. If that is going to compromise the model, is something that has yet to be seen.

But I am going ahead and close the issue. Please, if you can, look into what this could be, or at least improve the log to make whatever the problem is clearer.

Subarasheese closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33

Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33

Subarasheese commented Aug 29, 2023

p0p4k commented Aug 30, 2023

hildazzz commented Aug 30, 2023

Subarasheese commented Aug 30, 2023 •

edited

Loading

beqabeqa473 commented Aug 31, 2023

p0p4k commented Aug 31, 2023

beqabeqa473 commented Aug 31, 2023 via email

p0p4k commented Aug 31, 2023

beqabeqa473 commented Aug 31, 2023 via email

p0p4k commented Aug 31, 2023

Subarasheese commented Aug 31, 2023

p0p4k commented Aug 31, 2023

Subarasheese commented Aug 31, 2023 •

edited

Loading

p0p4k commented Aug 31, 2023

Subarasheese commented Sep 1, 2023

Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33

Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33

Comments

Subarasheese commented Aug 29, 2023

p0p4k commented Aug 30, 2023

hildazzz commented Aug 30, 2023

Subarasheese commented Aug 30, 2023 • edited Loading

beqabeqa473 commented Aug 31, 2023

p0p4k commented Aug 31, 2023

beqabeqa473 commented Aug 31, 2023 via email

p0p4k commented Aug 31, 2023

beqabeqa473 commented Aug 31, 2023 via email

p0p4k commented Aug 31, 2023

Subarasheese commented Aug 31, 2023

p0p4k commented Aug 31, 2023

Subarasheese commented Aug 31, 2023 • edited Loading

p0p4k commented Aug 31, 2023

Subarasheese commented Sep 1, 2023

Subarasheese commented Aug 30, 2023 •

edited

Loading

Subarasheese commented Aug 31, 2023 •

edited

Loading