compact bilinear with googlenet and resnet #9

yangjingyi · 2017-05-10T10:03:04Z

Hello, Thanks for opening your source code!
I consider about replace VGG net with googlenet or resnet because they have better performance. But the combined net doesn't converge. Have you tried this? What is the performance?

gy20073 · 2017-05-10T22:49:17Z

have you followed these steps: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312

Especially, the sample prototxt here: https://github.com/gy20073/compact_bilinear_pooling/tree/master/caffe-20160312/examples/compact_bilinear

Note that the L2 normalization layer and pre-training the last layer are two necessary steps.

yangjingyi · 2017-05-11T01:44:14Z

Yes, I have tried compact bilinear pooling with VGG 16, It works really well. But when I replace the basic model from VGG 16 to googlenet, it cannot converge.
I use inception_5b/output as the feature map to give to "CompactBilinear" layer. The top shape inception_5b/output is batch_size * 1024 * 7 * 7. Is it too much for Tensor Sketch Projection to work? I also give a larger output such as 16384, however, it still cannot converge.
I just cannot figure out what is the problem.
Have you ever tried that?

gy20073 · 2017-05-11T02:56:53Z

I haven’t tried GoogleNet or Resnet yet. But for ResNet, I have heard that fine-tuning it requires some tricks. For example, adding another one or two random initialized res-blocks after the last res-block might help. The intuition is that those model has less parameters and thus harder to fine-tune. For GoogleNet I’m not aware of anyone using such tricks, but it is harder to tune than VGG. For example the FCN paper reports much lower number on semantic segmentation task with GoogleNet than VGG.

…

On May 10, 2017, at 6:44 PM, yangjingyi ***@***.***> wrote: Yes, I have tried compact bilinear pooling with VGG 16, It works really well. But when I replace the basic model from VGG 16 to googlenet, it cannot converge. I use inception_5b/output as the feature map to give to "CompactBilinear" layer. The top shape inception_5b/output is batch_size * 1024 * 7 * 7. Is it too much for Tensor Sketch Projection to work? I also give a larger output such as 16384, however, it still cannot converge. I just cannot figure out what is the problem. Have you ever tried that? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFv2jO_Y48LoufmZ0plpMAua4Fc-UYnzks5r4mfugaJpZM4NWcSE>.

yangjingyi · 2017-05-11T06:08:17Z

Ok, I will use some tricks to try that. Thank you very much!

zhujiagang · 2017-07-23T13:44:33Z

@yangjingyi Maybe it is because that L2 normalization layer makes layer output very small, when gradients are computed they are all compressed. I tried this on UCF101 with classes 101. I have tried removing L2 normalization layer in BN-inception it can achieve 70% accuracy by only fintuning the last fc layer. I set clip_gradients: 40 in solver prototxt. I've also noticed that the sum of L2 norm of gradient becomes much larger (more than 10000) than when L2 normlization is applied (about 0.5) at the start of training, which confirmed my guessing. Removing L2 normalization can improve speed of convergence and accuracy by avoiding gradient disappearing, am I right @gy20073 ? At least experiments prove it at this point. And I also used weight_decay: 0.0005.

Jiangfeng-Xiong · 2017-08-05T16:48:17Z

@zhujiagang I use resnet on other dataset. Removing L2Normlize, it also converges faster! I think you are right

zhujiagang · 2017-08-06T04:46:39Z

@Jiangfeng-Xiong Though it behaves right, it must have some points to use L2Normalize. If L2Normalize could decrease gradients, I think we could also remove signed sqrt layer. Plan to do more experiments.

billhhh · 2017-10-09T06:13:50Z

Thank you for sharing your code, would you mind to see why my net cannot converge? the address is here

gy20073 · 2017-10-09T16:23:13Z

@billhhh maybe try removing the L2norm layer as @zhujiagang suggested?

billhhh · 2017-10-10T00:24:57Z

@gy20073 actually I have tried @zhujiagang 's method, I remove L2, it's not converge as well

billhhh · 2017-10-10T04:12:53Z

My accuracy runs like

I1010 12:05:57.231297 23371 solver.cpp:330] Iteration 23400, Testing net (#0)
I1010 12:06:08.602016 23410 data_layer.cpp:73] Restarting data prefetching from start.
I1010 12:06:23.678480 23371 solver.cpp:397] Test net output #0: loss1/loss1 = 1.62693 (* 0.3 = 0.48808 loss)
I1010 12:06:23.678530 23371 solver.cpp:397] Test net output #1: loss1/top-1 = 0.588281
I1010 12:06:23.678544 23371 solver.cpp:397] Test net output #2: loss1/top-5 = 0.846719
I1010 12:06:23.678558 23371 solver.cpp:397] Test net output #3: loss2/loss1 = 1.38752 (* 0.3 = 0.416255 loss)
I1010 12:06:23.678567 23371 solver.cpp:397] Test net output #4: loss2/top-1 = 0.63875
I1010 12:06:23.678575 23371 solver.cpp:397] Test net output #5: loss2/top-5 = 0.883281
I1010 12:06:23.678586 23371 solver.cpp:397] Test net output #6: loss3/loss3 = 6.33943 (* 1 = 6.33943 loss)
I1010 12:06:23.678596 23371 solver.cpp:397] Test net output #7: loss3/top-1 = 0.048125
I1010 12:06:23.678604 23371 solver.cpp:397] Test net output #8: loss3/top-5 = 0.115938

is it can be called converge, but just slow?
if so, should I train it more time?

gy20073 · 2017-10-10T14:55:18Z

@billhhh I'm not sure whether your specific architecture will converge or not. It seems that you are using multiple group of losses, which I haven't tried before.

billhhh · 2017-10-10T14:58:18Z

Thank you again, the original version of googlenet-v1 has 3 losses, I didn't modify it.

qwenqw · 2018-01-05T03:22:43Z

@gy20073 @zhujiagang Hello, thanks for your share.Now I use VGG on other dataset.When I train with L2normlization, the converge is very slow. And even I decrease the learning rate , the loss still has not much change. After 9k iterations, it does not converge.And then, I remove the L2norm layer according to your method. But the loss become Nan when I begin training. How can I solve it? Do you have some suggestions ? Thank you very much.

zimenglan-sysu-512 · 2018-11-13T13:39:25Z

hi @Jiangfeng-Xiong
can u share the performance using resnet50 backbone ?
thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact bilinear with googlenet and resnet #9

compact bilinear with googlenet and resnet #9

yangjingyi commented May 10, 2017

gy20073 commented May 10, 2017

yangjingyi commented May 11, 2017

gy20073 commented May 11, 2017 via email

yangjingyi commented May 11, 2017

zhujiagang commented Jul 23, 2017

Jiangfeng-Xiong commented Aug 5, 2017

zhujiagang commented Aug 6, 2017

billhhh commented Oct 9, 2017 •

edited

Loading

gy20073 commented Oct 9, 2017

billhhh commented Oct 10, 2017

billhhh commented Oct 10, 2017

gy20073 commented Oct 10, 2017

billhhh commented Oct 10, 2017

qwenqw commented Jan 5, 2018

zimenglan-sysu-512 commented Nov 13, 2018

compact bilinear with googlenet and resnet #9

compact bilinear with googlenet and resnet #9

Comments

yangjingyi commented May 10, 2017

gy20073 commented May 10, 2017

yangjingyi commented May 11, 2017

gy20073 commented May 11, 2017 via email

yangjingyi commented May 11, 2017

zhujiagang commented Jul 23, 2017

Jiangfeng-Xiong commented Aug 5, 2017

zhujiagang commented Aug 6, 2017

billhhh commented Oct 9, 2017 • edited Loading

gy20073 commented Oct 9, 2017

billhhh commented Oct 10, 2017

billhhh commented Oct 10, 2017

gy20073 commented Oct 10, 2017

billhhh commented Oct 10, 2017

qwenqw commented Jan 5, 2018

zimenglan-sysu-512 commented Nov 13, 2018

billhhh commented Oct 9, 2017 •

edited

Loading