Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact bilinear with googlenet and resnet #9

Open
yangjingyi opened this issue May 10, 2017 · 15 comments
Open

compact bilinear with googlenet and resnet #9

yangjingyi opened this issue May 10, 2017 · 15 comments

Comments

@yangjingyi
Copy link

Hello, Thanks for opening your source code!
I consider about replace VGG net with googlenet or resnet because they have better performance. But the combined net doesn't converge. Have you tried this? What is the performance?

@gy20073
Copy link
Owner

gy20073 commented May 10, 2017

have you followed these steps: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312

Especially, the sample prototxt here: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312/examples/compact_bilinear

Note that the L2 normalization layer and pre-training the last layer are two necessary steps.

@yangjingyi
Copy link
Author

Yes, I have tried compact bilinear pooling with VGG 16, It works really well. But when I replace the basic model from VGG 16 to googlenet, it cannot converge.
I use inception_5b/output as the feature map to give to "CompactBilinear" layer. The top shape inception_5b/output is batch_size * 1024 * 7 * 7. Is it too much for Tensor Sketch Projection to work? I also give a larger output such as 16384, however, it still cannot converge.
I just cannot figure out what is the problem.
Have you ever tried that?

@gy20073
Copy link
Owner

gy20073 commented May 11, 2017 via email

@yangjingyi
Copy link
Author

Ok, I will use some tricks to try that. Thank you very much!

@zhujiagang
Copy link

@yangjingyi Maybe it is because that L2 normalization layer makes layer output very small, when gradients are computed they are all compressed. I tried this on UCF101 with classes 101. I have tried removing L2 normalization layer in BN-inception it can achieve 70% accuracy by only fintuning the last fc layer. I set clip_gradients: 40 in solver prototxt. I've also noticed that the sum of L2 norm of gradient becomes much larger (more than 10000) than when L2 normlization is applied (about 0.5) at the start of training, which confirmed my guessing. Removing L2 normalization can improve speed of convergence and accuracy by avoiding gradient disappearing, am I right @gy20073 ? At least experiments prove it at this point. And I also used weight_decay: 0.0005.

@Jiangfeng-Xiong
Copy link

@zhujiagang I use resnet on other dataset. Removing L2Normlize, it also converges faster! I think you are right

@zhujiagang
Copy link

@Jiangfeng-Xiong Though it behaves right, it must have some points to use L2Normalize. If L2Normalize could decrease gradients, I think we could also remove signed sqrt layer. Plan to do more experiments.

@billhhh
Copy link

billhhh commented Oct 9, 2017

Thank you for sharing your code, would you mind to see why my net cannot converge? the address is here

@gy20073
Copy link
Owner

gy20073 commented Oct 9, 2017

@billhhh maybe try removing the L2norm layer as @zhujiagang suggested?

@billhhh
Copy link

billhhh commented Oct 10, 2017

@gy20073 actually I have tried @zhujiagang 's method, I remove L2, it's not converge as well

@billhhh
Copy link

billhhh commented Oct 10, 2017

My accuracy runs like

I1010 12:05:57.231297 23371 solver.cpp:330] Iteration 23400, Testing net (#0)
I1010 12:06:08.602016 23410 data_layer.cpp:73] Restarting data prefetching from start.
I1010 12:06:23.678480 23371 solver.cpp:397] Test net output #0: loss1/loss1 = 1.62693 (* 0.3 = 0.48808 loss)
I1010 12:06:23.678530 23371 solver.cpp:397] Test net output #1: loss1/top-1 = 0.588281
I1010 12:06:23.678544 23371 solver.cpp:397] Test net output #2: loss1/top-5 = 0.846719
I1010 12:06:23.678558 23371 solver.cpp:397] Test net output #3: loss2/loss1 = 1.38752 (* 0.3 = 0.416255 loss)
I1010 12:06:23.678567 23371 solver.cpp:397] Test net output #4: loss2/top-1 = 0.63875
I1010 12:06:23.678575 23371 solver.cpp:397] Test net output #5: loss2/top-5 = 0.883281
I1010 12:06:23.678586 23371 solver.cpp:397] Test net output #6: loss3/loss3 = 6.33943 (* 1 = 6.33943 loss)
I1010 12:06:23.678596 23371 solver.cpp:397] Test net output #7: loss3/top-1 = 0.048125
I1010 12:06:23.678604 23371 solver.cpp:397] Test net output #8: loss3/top-5 = 0.115938

is it can be called converge, but just slow?
if so, should I train it more time?

@gy20073
Copy link
Owner

gy20073 commented Oct 10, 2017

@billhhh I'm not sure whether your specific architecture will converge or not. It seems that you are using multiple group of losses, which I haven't tried before.

@billhhh
Copy link

billhhh commented Oct 10, 2017

Thank you again, the original version of googlenet-v1 has 3 losses, I didn't modify it.

@qwenqw
Copy link

qwenqw commented Jan 5, 2018

@gy20073 @zhujiagang Hello, thanks for your share.Now I use VGG on other dataset.When I train with L2normlization, the converge is very slow. And even I decrease the learning rate , the loss still has not much change. After 9k iterations, it does not converge.And then, I remove the L2norm layer according to your method. But the loss become Nan when I begin training. How can I solve it? Do you have some suggestions ? Thank you very much.

@zimenglan-sysu-512
Copy link

hi @Jiangfeng-Xiong
can u share the performance using resnet50 backbone ?
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants