EOFError #441

JuliusWang-7 · 2020-12-18T06:43:41Z

Hi, Fabian.
I don't know how to figure out this problem. What's more, when I use the command -c, it still continues from epoch 100, even i have trained over 300 epochs. Could you give me some advice, thanks a lot.

FabianIsensee · 2020-12-18T08:24:32Z

Please delete all the npy files in the nnUNet_preprocessed folder of your task and try again

FabianIsensee · 2020-12-18T08:24:57Z

(do not delete the npz files!)

And make sure your SSD is not full

JuliusWang-7 · 2020-12-18T12:45:32Z

Thanks. It does work.

JuliusWang-7 · 2020-12-21T08:29:14Z

It still stops from time to time. Although the command -c works now, I have to delete the npy files again and again.

FabianIsensee · 2020-12-21T09:40:12Z

Are you training in a docker container?

JuliusWang-7 · 2020-12-22T06:20:53Z

Are you training in a docker container?

No, I trained it locally in Linux

FabianIsensee · 2020-12-22T07:36:16Z

Mhm it appears like something is wrong with the location you store the data at. Is it a local SSD?

ProfessorHuang · 2020-12-23T01:35:27Z

Similar problem, my training stops from time to time. So, I have to continue the training with command -c manually. (but I trained it in a docker container)

JuliusWang-7 · 2020-12-23T08:32:51Z

Mhm it appears like something is wrong with the location you store the data at. Is it a local SSD?

yes

FabianIsensee · 2020-12-31T11:58:27Z

Hm that is strange. Have you checked your RAM? Maybe the ram was full and the system killed some background worker

JuliusWang-7 · 2021-01-02T06:01:04Z

Hm that is strange. Have you checked your RAM? Maybe the ram was full and the system killed some background worker

Thanks, maybe something wrong caused by the Ram, because I run four folds at the same time.
But what is strange is when I run 3d_fullres this problem won't happen.
Happy New Year! Wish you have a wonderful year, Fabian.

FabianIsensee · 2021-01-04T07:01:05Z

Hm you should be able to train multiple folds simultaneously. I do it all the time. The only thing you have to consider is that only one of the trainings can to the extraction of the files (npz -> npy) at once, so if you train multiple folds you need to first just start one fold. Only once this fold is using the GPU you can start the others. If you already have the files extracted from a previous training then you can start all folds simultaneously

JuliusWang-7 closed this as completed Jan 8, 2021

plbenveniste mentioned this issue Jul 31, 2024

Training of an nnUNet model ivadomed/ms-lesion-agnostic#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOFError #441

EOFError #441

JuliusWang-7 commented Dec 18, 2020

FabianIsensee commented Dec 18, 2020

FabianIsensee commented Dec 18, 2020

JuliusWang-7 commented Dec 18, 2020

JuliusWang-7 commented Dec 21, 2020

FabianIsensee commented Dec 21, 2020

JuliusWang-7 commented Dec 22, 2020

FabianIsensee commented Dec 22, 2020

ProfessorHuang commented Dec 23, 2020

JuliusWang-7 commented Dec 23, 2020

FabianIsensee commented Dec 31, 2020

JuliusWang-7 commented Jan 2, 2021

FabianIsensee commented Jan 4, 2021

EOFError #441

EOFError #441

Comments

JuliusWang-7 commented Dec 18, 2020

FabianIsensee commented Dec 18, 2020

FabianIsensee commented Dec 18, 2020

JuliusWang-7 commented Dec 18, 2020

JuliusWang-7 commented Dec 21, 2020

FabianIsensee commented Dec 21, 2020

JuliusWang-7 commented Dec 22, 2020

FabianIsensee commented Dec 22, 2020

ProfessorHuang commented Dec 23, 2020

JuliusWang-7 commented Dec 23, 2020

FabianIsensee commented Dec 31, 2020

JuliusWang-7 commented Jan 2, 2021

FabianIsensee commented Jan 4, 2021