Learning rate #9

DQDH · 2018-10-21T14:17:03Z

I don't know how to get the parameters of the Ours-ResNet segmentation network. Can you give a explain for the parameters ?Thanks.

LeiyuanMa · 2018-10-22T00:16:20Z

I tried change the learning rate to 0.01,and the batchsize 4,the loss is decreased to 0.0403,only within one epoch(Iter:37000/39675,a epoch almost finised but failed),but the program often cause a error like this:
validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

So,is there any parameters need to change?or any advice on the error?

LeiyuanMa · 2018-10-22T00:36:15Z

I tried change the learing rate to 0.01,and batchsize=4,and due to the limit of GPU resource,I set
model = torch.nn.DataParallel(model,device_ids=[0]),but after there is alwayes a error like this:
Iter:36900/39675 Loss:0.0413 imps:3.5 Fin:Mon Oct 22 03:56:00 2018 lr: 0.0009
Iter:36950/39675 Loss:0.0363 imps:3.5 Fin:Mon Oct 22 03:55:56 2018 lr: 0.0009
Iter:37000/39675 Loss:0.0403 imps:3.5 Fin:Mon Oct 22 03:55:53 2018 lr: 0.0009

validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

the program can't finish a epoch,but the loss is decreased to 0.0403,is it accepted or need more epoch?
Do you have any advise on this error?

jiwoon-ahn · 2018-10-23T09:06:27Z

@hardBird123, I trained by Adam setting initial learning rate as 0.001. But I didn't try to find the optimal learning rate. You can get better results than mine by adopting SGD or just following the method described in https://arxiv.org/pdf/1611.10080.pdf.

jiwoon-ahn · 2018-10-23T09:12:08Z

@LeiyuanMa, Sorry, I can't help you with that error. Probably related to the memory leak. In my case, training epochs do not change the performance a lot. And I haven't tested training 15 epochs is the best for the network.

DQDH · 2018-10-23T09:13:33Z

ok, thanks. I want to confirm that the weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params] is the pretrained weights for training segmentation network ResNet38?

jiwoon-ahn · 2018-10-23T09:15:20Z

@hardBird123, Yes, that is the right file for the segmentation network.

LeiyuanMa · 2018-10-23T09:20:15Z

thanks,so is the loss=0.0403 acceptable?

DQDH · 2018-10-30T13:02:26Z

which lr_type(fixed(default)/step/linear) should I choose when training the ResNet38 segmentation network？

suoranxiu · 2019-02-28T08:56:16Z

hello, I'm a student who running this code. And there is a running error. Can you give me some tips about this issue.

RuntimeError: size mismatch, m1: [1 x 20], m2: [1 x 20]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate #9

Learning rate #9

DQDH commented Oct 21, 2018

LeiyuanMa commented Oct 22, 2018

LeiyuanMa commented Oct 22, 2018

jiwoon-ahn commented Oct 23, 2018

jiwoon-ahn commented Oct 23, 2018

DQDH commented Oct 23, 2018

jiwoon-ahn commented Oct 23, 2018

LeiyuanMa commented Oct 23, 2018

DQDH commented Oct 30, 2018

suoranxiu commented Feb 28, 2019

Learning rate #9

Learning rate #9

Comments

DQDH commented Oct 21, 2018

LeiyuanMa commented Oct 22, 2018

LeiyuanMa commented Oct 22, 2018

jiwoon-ahn commented Oct 23, 2018

jiwoon-ahn commented Oct 23, 2018

DQDH commented Oct 23, 2018

jiwoon-ahn commented Oct 23, 2018

LeiyuanMa commented Oct 23, 2018

DQDH commented Oct 30, 2018

suoranxiu commented Feb 28, 2019