Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFError #441

Closed
JuliusWang-7 opened this issue Dec 18, 2020 · 12 comments
Closed

EOFError #441

JuliusWang-7 opened this issue Dec 18, 2020 · 12 comments

Comments

@JuliusWang-7
Copy link

1
Hi, Fabian.
I don't know how to figure out this problem. What's more, when I use the command -c, it still continues from epoch 100, even i have trained over 300 epochs. Could you give me some advice, thanks a lot.

@FabianIsensee
Copy link
Member

Please delete all the npy files in the nnUNet_preprocessed folder of your task and try again

@FabianIsensee
Copy link
Member

(do not delete the npz files!)

And make sure your SSD is not full

@JuliusWang-7
Copy link
Author

Thanks. It does work.

@JuliusWang-7
Copy link
Author

It still stops from time to time. Although the command -c works now, I have to delete the npy files again and again.

@FabianIsensee
Copy link
Member

Are you training in a docker container?

@JuliusWang-7
Copy link
Author

Are you training in a docker container?

No, I trained it locally in Linux

@FabianIsensee
Copy link
Member

Mhm it appears like something is wrong with the location you store the data at. Is it a local SSD?

@ProfessorHuang
Copy link

Similar problem, my training stops from time to time. So, I have to continue the training with command -c manually. (but I trained it in a docker container)

@JuliusWang-7
Copy link
Author

Mhm it appears like something is wrong with the location you store the data at. Is it a local SSD?

yes

@FabianIsensee
Copy link
Member

Hm that is strange. Have you checked your RAM? Maybe the ram was full and the system killed some background worker

@JuliusWang-7
Copy link
Author

Hm that is strange. Have you checked your RAM? Maybe the ram was full and the system killed some background worker

Thanks, maybe something wrong caused by the Ram, because I run four folds at the same time.
But what is strange is when I run 3d_fullres this problem won't happen.
Happy New Year! Wish you have a wonderful year, Fabian.

@FabianIsensee
Copy link
Member

Hm you should be able to train multiple folds simultaneously. I do it all the time. The only thing you have to consider is that only one of the trainings can to the extraction of the files (npz -> npy) at once, so if you train multiple folds you need to first just start one fold. Only once this fold is using the GPU you can start the others. If you already have the files extracted from a previous training then you can start all folds simultaneously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants