-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when training model: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' #33
Comments
Is this your private dataset? I suspect the lengths of the text might be less than minimum value or something like that. You can just copy the |
Or you can decrease the values of boundaries in DistributedBucketSampler and see what happens. |
Yes, it is a custom dataset. My dataset looks normal, and the texts are pretty long. The code is erroring out after this line, not sure why as the error logs are not clear. What do I need to inspect to check what is wrong?
Where can I do that? By the way, this is the config file I am using for training:
|
Hi all. btw, can i turn off validation and test lists? i would be able to validate model myself. My dataset is not so large to throw away sentences to validations and tests |
just pass in the same list to both train and validation. For the validation loader use |
and what about test.txt in filelists?
…On 8/31/23, p0p ***@***.***> wrote:
just pass in the same list to both train and validation. For the validation
loader use `torch.utils.data.Subset` and pass only 4-5 samples so that you
can get the evaluation while training. If you want to completely turn off
evaluation, just comment out `evaluate()` function in `train.py`.
--
Reply to this email directly or view it on GitHub:
#33 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
--
with best regards Beqa Gozalishvili
Tell: +995593454005
Email: ***@***.***
Web: https://gozaltech.org
Skype: beqabeqa473
Telegram: https://t.me/gozaltech
facebook: https://facebook.com/gozaltech
twitter: https://twitter.com/beqabeqa473
Instagram: https://instagram.com/beqa.gozalishvili
|
Test is the evaluation in this repo. |
so validations and tests is the same?
…On 8/31/23, p0p ***@***.***> wrote:
Test is the evaluation in this repo.
--
Reply to this email directly or view it on GitHub:
#33 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
--
with best regards Beqa Gozalishvili
Tell: +995593454005
Email: ***@***.***
Web: https://gozaltech.org
Skype: beqabeqa473
Telegram: https://t.me/gozaltech
facebook: https://facebook.com/gozaltech
twitter: https://twitter.com/beqabeqa473
Instagram: https://instagram.com/beqa.gozalishvili
|
Yes, we just send 2 lists in config. Train and val |
No clue on what the problem might be? What do you suggest me to do with the dataset? |
Print out the "len_bucket" in data_utils and try to debug from there. |
So bucket length 0 is causing the issue. Cannot divide by 0. You can just entirely disable this function the dataloader temporarily. |
@p0p4k I managed to get the training working by making a few changes, including: 1 - skipping the "0" bucket in that for-loop to avoid the exception And now it started training, but the bucket issue sure is strange and I believe it was not supposed to happen. If that is going to compromise the model, is something that has yet to be seen. But I am going ahead and close the issue. Please, if you can, look into what this could be, or at least improve the log to make whatever the problem is clearer. |
Greetings,
As seen on #10 (comment), someone successfully trained models, so I decided to try it myself.
I used the following command:
python train.py -c configs/vits2_voice_training.json -m mydataset
However, the following happens:
What could be the problem here, and what could I try doing to fix this?
The text was updated successfully, but these errors were encountered: