-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact bilinear with googlenet and resnet #9
Comments
have you followed these steps: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312 Especially, the sample prototxt here: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312/examples/compact_bilinear Note that the L2 normalization layer and pre-training the last layer are two necessary steps. |
Yes, I have tried compact bilinear pooling with VGG 16, It works really well. But when I replace the basic model from VGG 16 to googlenet, it cannot converge. |
I haven’t tried GoogleNet or Resnet yet.
But for ResNet, I have heard that fine-tuning it requires some tricks. For example, adding another one or two random initialized res-blocks after the last res-block might help. The intuition is that those model has less parameters and thus harder to fine-tune. For GoogleNet I’m not aware of anyone using such tricks, but it is harder to tune than VGG. For example the FCN paper reports much lower number on semantic segmentation task with GoogleNet than VGG.
… On May 10, 2017, at 6:44 PM, yangjingyi ***@***.***> wrote:
Yes, I have tried compact bilinear pooling with VGG 16, It works really well. But when I replace the basic model from VGG 16 to googlenet, it cannot converge.
I use inception_5b/output as the feature map to give to "CompactBilinear" layer. The top shape inception_5b/output is batch_size * 1024 * 7 * 7. Is it too much for Tensor Sketch Projection to work? I also give a larger output such as 16384, however, it still cannot converge.
I just cannot figure out what is the problem.
Have you ever tried that?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFv2jO_Y48LoufmZ0plpMAua4Fc-UYnzks5r4mfugaJpZM4NWcSE>.
|
Ok, I will use some tricks to try that. Thank you very much! |
@yangjingyi Maybe it is because that L2 normalization layer makes layer output very small, when gradients are computed they are all compressed. I tried this on UCF101 with classes 101. I have tried removing L2 normalization layer in BN-inception it can achieve 70% accuracy by only fintuning the last fc layer. I set clip_gradients: 40 in solver prototxt. I've also noticed that the sum of L2 norm of gradient becomes much larger (more than 10000) than when L2 normlization is applied (about 0.5) at the start of training, which confirmed my guessing. Removing L2 normalization can improve speed of convergence and accuracy by avoiding gradient disappearing, am I right @gy20073 ? At least experiments prove it at this point. And I also used weight_decay: 0.0005. |
@zhujiagang I use resnet on other dataset. Removing L2Normlize, it also converges faster! I think you are right |
@Jiangfeng-Xiong Though it behaves right, it must have some points to use L2Normalize. If L2Normalize could decrease gradients, I think we could also remove signed sqrt layer. Plan to do more experiments. |
Thank you for sharing your code, would you mind to see why my net cannot converge? the address is here |
@billhhh maybe try removing the L2norm layer as @zhujiagang suggested? |
@gy20073 actually I have tried @zhujiagang 's method, I remove L2, it's not converge as well |
My accuracy runs like
is it can be called converge, but just slow? |
@billhhh I'm not sure whether your specific architecture will converge or not. It seems that you are using multiple group of losses, which I haven't tried before. |
Thank you again, the original version of googlenet-v1 has 3 losses, I didn't modify it. |
@gy20073 @zhujiagang Hello, thanks for your share.Now I use VGG on other dataset.When I train with L2normlization, the converge is very slow. And even I decrease the learning rate , the loss still has not much change. After 9k iterations, it does not converge.And then, I remove the L2norm layer according to your method. But the loss become Nan when I begin training. How can I solve it? Do you have some suggestions ? Thank you very much. |
hi @Jiangfeng-Xiong |
Hello, Thanks for opening your source code!
I consider about replace VGG net with googlenet or resnet because they have better performance. But the combined net doesn't converge. Have you tried this? What is the performance?
The text was updated successfully, but these errors were encountered: