Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

rcnn example issue collection #4713

Closed
ijkguo opened this issue Jan 18, 2017 · 8 comments
Closed

rcnn example issue collection #4713

ijkguo opened this issue Jan 18, 2017 · 8 comments

Comments

@ijkguo
Copy link
Contributor

ijkguo commented Jan 18, 2017

The rcnn example has been adapted to be compatible with nnvm branch in https://github.com/precedenceguo/mx-rcnn. Waiting for results to confirm everything works. I invite you to try this new version right now.

I have collected some issues of rcnn example.

The following will be fixed soon.

There are some interesting things:
#3704, the group symbol behavior is changed with nnvm?
#3542, I don't see anything wrong with layout?
#2214, the converter issue is good for reference.

@piiswrong
Copy link
Contributor

CustomOp cpu training will freeze or blocked training
This should have been solved. Please pull.

4G memory is not enough for VGG e2e training. we seem to have a lot of memory regressions recently. @precedenceguo Was this working on 0.8? @tqchen Could you take a look?

@ijkguo
Copy link
Contributor Author

ijkguo commented Jan 19, 2017

CustomOp cpu training will freeze or blocked training
Confirmed solved.

4G memory is not enough for VGG e2e training.
Behavior is the same as v0.8. I was wondering why caffe needs even less memory. It is related to cuDNN v3?

@morusu
Copy link

morusu commented Feb 24, 2017

cudnn_auto_tune problem.
#4656,
Set env MXNET_CUDNN_AUTOTUNE_DEFAULT=0. Will be fixed as default.

any other ways? I find the MXNET_CUDNN_AUTOTUNE_DEFAULT is very useful

@ijkguo
Copy link
Contributor Author

ijkguo commented Feb 24, 2017

You can keep it if you like.

@realwill
Copy link

@precedenceguo not only customop cpu training will freeze, but gpu training, when I use gpu training fcis and deformable convolution network, asnumpy() halted after random iterations.

@hzh8311
Copy link

hzh8311 commented Jun 8, 2017

I try to train faster rcnn with resnet-101 as backbone network and OHEM(online hard example mining) on 4 gpus, and encountered with #4224 although I have 12GB K80 GPUs. I wonder does it to be just a memory shortage problem or some other issue I do not don't know?

@Jerryzcn
Copy link
Contributor

Jerryzcn commented Jul 12, 2017

It seems I cannot train with batch size more than 1 on each GPU, will there be a fix for this. (I guess this will be a feature request)

@szha
Copy link
Member

szha commented Oct 29, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

@szha szha closed this as completed Oct 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants