Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate #9

Open
DQDH opened this issue Oct 21, 2018 · 9 comments
Open

Learning rate #9

DQDH opened this issue Oct 21, 2018 · 9 comments

Comments

@DQDH
Copy link

DQDH commented Oct 21, 2018

I don't know how to get the parameters of the Ours-ResNet segmentation network. Can you give a explain for the parameters ?Thanks.

@LeiyuanMa
Copy link

I tried change the learning rate to 0.01,and the batchsize 4,the loss is decreased to 0.0403,only within one epoch(Iter:37000/39675,a epoch almost finised but failed),but the program often cause a error like this:
validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

So,is there any parameters need to change?or any advice on the error?

@LeiyuanMa
Copy link

I tried change the learing rate to 0.01,and batchsize=4,and due to the limit of GPU resource,I set
model = torch.nn.DataParallel(model,device_ids=[0]),but after there is alwayes a error like this:
Iter:36900/39675 Loss:0.0413 imps:3.5 Fin:Mon Oct 22 03:56:00 2018 lr: 0.0009
Iter:36950/39675 Loss:0.0363 imps:3.5 Fin:Mon Oct 22 03:55:56 2018 lr: 0.0009
Iter:37000/39675 Loss:0.0403 imps:3.5 Fin:Mon Oct 22 03:55:53 2018 lr: 0.0009

validating ... terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

the program can't finish a epoch,but the loss is decreased to 0.0403,is it accepted or need more epoch?
Do you have any advise on this error?

@jiwoon-ahn
Copy link
Owner

@hardBird123, I trained by Adam setting initial learning rate as 0.001. But I didn't try to find the optimal learning rate. You can get better results than mine by adopting SGD or just following the method described in https://arxiv.org/pdf/1611.10080.pdf.

@jiwoon-ahn
Copy link
Owner

@LeiyuanMa, Sorry, I can't help you with that error. Probably related to the memory leak. In my case, training epochs do not change the performance a lot. And I haven't tested training 15 epochs is the best for the network.

@DQDH
Copy link
Author

DQDH commented Oct 23, 2018

ok, thanks. I want to confirm that the weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params] is the pretrained weights for training segmentation network ResNet38?

@jiwoon-ahn
Copy link
Owner

@hardBird123, Yes, that is the right file for the segmentation network.

@LeiyuanMa
Copy link

thanks,so is the loss=0.0403 acceptable?

@DQDH
Copy link
Author

DQDH commented Oct 30, 2018

which lr_type(fixed(default)/step/linear) should I choose when training the ResNet38 segmentation network?

@suoranxiu
Copy link

hello, I'm a student who running this code. And there is a running error. Can you give me some tips about this issue.
2019-02-24 204758

RuntimeError: size mismatch, m1: [1 x 20], m2: [1 x 20]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants