Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

I meet problem during implement light_head_rcnn #45

Open
hewumars opened this issue May 6, 2018 · 14 comments
Open

I meet problem during implement light_head_rcnn #45

hewumars opened this issue May 6, 2018 · 14 comments

Comments

@hewumars
Copy link

hewumars commented May 6, 2018

loss_bbox is not converge.other loss(loss_cls,loss_rpn_cls,loss_bbox) is converge.can I push the code to you for debug.

@Rizhiy
Copy link

Rizhiy commented May 7, 2018

Hi, where did you take PSRoIPool layer from? A lot of PyTorch implementations of that layer are bugged. Also, they are probably implemented for single image batch and might not work with multiple images per batch.

@hewumars
Copy link
Author

hewumars commented May 7, 2018

PSRoI_Align from https://github.com/zengarden/light_head_rcnn
PSRoIPooling from https://github.com/PureDiors/pytorch_RFCN
I set batchsize=1 when trianing light_head_rcnn. the codes seem to be able to work with multiple images per batch,but at least single image per batch can work.

@Rizhiy
Copy link

Rizhiy commented May 7, 2018

I'm pretty sure PSRoIPooling in that repo is bugged, see: xiong-zhitong/pytorch_RFCN#4.

@hewumars
Copy link
Author

hewumars commented May 7, 2018

light head rcnn model also is not converge use PSRoI_Align from https://github.com/zengarden/light_head_rcnn ,I pull requests:#48

@hewumars
Copy link
Author

hewumars commented May 7, 2018

I will carefully check the code

@hewumars
Copy link
Author

hewumars commented May 8, 2018

@Rizhiy could you share PSRoIPooling ? I compare the code with https://github.com/msracver/Deformable-ConvNets/blob/master/rfcn/operator_cxx/psroi_pooling.cu,the different as shown:
image

@Rizhiy
Copy link

Rizhiy commented May 8, 2018

@hewumars I haven't yet got PSRoIPooling to work in PyTorch either.

@YanShuo1992
Copy link

@Rizhiy How is the PSROI pooling going? I have seen you in many different repos. I think we both focus on the light-head rcnn, right? I don't get the PSRoIpooling in Pytorch either. I think it could be easier to use the code from the official tf implementation.

@Rizhiy
Copy link

Rizhiy commented Jul 17, 2018

@YanShuo1992 I'm currently using roytseng-tw/Detectron.pytorch, so far I have focused on getting the best mAP, so didn't put much work in light-head. I will try to let you know if I get something working.

@YanShuo1992
Copy link

@hewumars @Rizhiy
I checked @hewumars 's light head rcnn code. I might find something wrong. I use the PSROIpooling after the res5 or stage5 in resnet50, right? But the RPN is still after the stage4. What do you think?

@Rizhiy
Copy link

Rizhiy commented Jul 19, 2018

That's not entirely correct. You need to pass output of res5, through a layer which has k*k*n filters, where k is pooling size and n is arbitrary number of layers (10 in the paper). Then you apply psroipool on that.

I suggest you check https://github.com/msracver/Deformable-ConvNets/blob/f4e163719c8e63cfad7af1caaaab93d373750393/rfcn/symbols/resnet_v1_101_rfcn.py#L785-L798 for reference.

@YanShuo1992
Copy link

@Rizhiy
I will check the official rfcn to see how the rpn and large conv orignized.
@roytseng-tw
I am trying to implement the light rcnn based on your code. I tried a code from @hewumars and I get
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generic/THCStorage.cu:58

So that I check the .cu code of psroipooling. I find you commit that do not use rounding in the roialign_kernel.cu. Can you tell me the reason for that or what problem it will lead?

@GYxiaOH
Copy link

GYxiaOH commented Aug 22, 2018

@YanShuo1992
are you meet out of memory after some iterations? i meet same question , i compare psroi code with caffe2 and can't find some things.but i barely use CUDA coding so......
do you solve the problem?

@YanShuo1992
Copy link

@GYxiaOH
Yes. I meet the out of memory when using psroi. I also check the caffe2 code or the tensorflow code and I find nothing. For now, I just give up the psroi and use alignroi.

elnazavr pushed a commit to elnazavr/Detectron.pytorch that referenced this issue Apr 3, 2019
In __init__() save self.num_data for use in __len__()
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants