Memory #7

WangZhouTao · 2017-12-17T08:29:22Z

hello, I am a student.recently,I am reading the HED paper. I found that your code is very interesting,but some error is happened when I try to train the model with my own data.Error message show that Memory error.My computer in-build 32GB memory, would you give me some suggestion.Think you very much.

CangHaiQingYue · 2017-12-18T12:17:14Z

my GPU only has 6G memory which is fine to run the code. dont know your setting

WangZhouTao · 2017-12-18T13:32:45Z

Sorry,I forgot to tell you.I use 1080ti GPU and 32GB memory.I can run your code with my own data.But Memory error is happened when the iteration is about 13000.I obverse top command in ubuntu, i find Memory is insufficient（not gpu memory, is computer memory).If you could tell me some suggestion,I really appreciate what you’ve done。

CangHaiQingYue · 2017-12-18T13:45:19Z

Sorry, I have not had such error. This code will save a checkpoint each 100 steps, maybe 13000 is too big in your dataset( I trained at 30000 steps without memory error).

WangZhouTao · 2017-12-19T07:05:58Z

I already changed to save a checkpoint with each 1000 steps, but the error is still happen.Did you run this code on windows? Thank you.

CangHaiQingYue · 2017-12-19T07:23:30Z

I run it on Ubuntu 16.04.

sandhawalia · 2017-12-21T10:05:57Z

Could you please give details of your data-set ? It might be the case that your image sizes are quite large and since the VGG base model is full convolutional the intermediate representations overflow they GPU memory.

wutachiang · 2018-05-28T06:36:24Z

I also encountered the same problem with you, my server configuration is 8 Tesla p20, memory is 512G. Even with this configuration, memory error occurs after approximately 13,000 iterations during training. Will you solve the problem?@CangHaiQingYue

CangHaiQingYue · 2018-05-28T09:42:55Z

@Jasontachiangwu Well, when I cancelled the 'summary_write' operator, this problem was gone.
So I guess there maybe some bug in 'summary'. You may rebuild your own code.

wutachiang · 2018-05-29T08:07:44Z

@CangHaiQingYue，After I updated the tensorflow version to r1.8, no problem was found after training. It may be that there is a bug in the summary.

yuezhilanyi · 2018-06-26T01:13:59Z

with same code on windows 10 and ubuntu 16.04, tensorflow v1.4.0, 20000 iterations:
memory error occurs in ubuntu 16.04
while no error occurs in windows 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory #7

Memory #7

WangZhouTao commented Dec 17, 2017

CangHaiQingYue commented Dec 18, 2017

WangZhouTao commented Dec 18, 2017

CangHaiQingYue commented Dec 18, 2017

WangZhouTao commented Dec 19, 2017

CangHaiQingYue commented Dec 19, 2017

sandhawalia commented Dec 21, 2017

wutachiang commented May 28, 2018

CangHaiQingYue commented May 28, 2018

wutachiang commented May 29, 2018

yuezhilanyi commented Jun 26, 2018 •

edited

Loading

Memory #7

Memory #7

Comments

WangZhouTao commented Dec 17, 2017

CangHaiQingYue commented Dec 18, 2017

WangZhouTao commented Dec 18, 2017

CangHaiQingYue commented Dec 18, 2017

WangZhouTao commented Dec 19, 2017

CangHaiQingYue commented Dec 19, 2017

sandhawalia commented Dec 21, 2017

wutachiang commented May 28, 2018

CangHaiQingYue commented May 28, 2018

wutachiang commented May 29, 2018

yuezhilanyi commented Jun 26, 2018 • edited Loading

yuezhilanyi commented Jun 26, 2018 •

edited

Loading