Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VOC2012 training failure, Memory error #49

Open
CassieMai opened this issue Mar 22, 2017 · 12 comments
Open

VOC2012 training failure, Memory error #49

CassieMai opened this issue Mar 22, 2017 · 12 comments

Comments

@CassieMai
Copy link

Hello, I have another problem when running ./experiments/scripts/mnc_5stage.sh 0 VGG16, I don't find out why it happens. Thank people who will concern this issue!

I0322 10:32:43.313863 29651 net.cpp:270] This network produces output seg_cls_loss_ext
I0322 10:32:43.400629 29651 net.cpp:283] Network initialization done.
I0322 10:32:43.400925 29651 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0322 10:32:43.722609 29651 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0322 10:32:43.722632 29651 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0322 10:32:43.722635 29651 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0322 10:32:43.781419 29651 net.cpp:810] Ignoring source layer drop6
I0322 10:32:43.791087 29651 net.cpp:810] Ignoring source layer drop7
I0322 10:32:43.849704 29651 net.cpp:810] Ignoring source layer drop6_mask
I0322 10:32:43.859282 29651 net.cpp:810] Ignoring source layer drop7_mask
Solving...
/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False)
/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_box = scaled_gt_boxes[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask_info = mask_info[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  top_mask_info[i, 0] = gt_assignment[val]
/MNC/tools/../lib/pylayer/mask_layer.py:75: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[info[0]][0:info[1], 0:info[2]]
/MNC/tools/../lib/pylayer/stage_bridge_layer.py:224: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
Traceback (most recent call last):
  File "./tools/train_net.py", line 96, in <module>
    _solver.train_model(args.max_iters)
  File "/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 127, in train_model
    self.solver.step(1)
MemoryError
@xialuxi
Copy link

xialuxi commented Mar 23, 2017

i have the same problem, Because my machine can't keep up for VGG16.

@xialuxi
Copy link

xialuxi commented Mar 23, 2017

I want to know how to use other network to train?

@CassieMai
Copy link
Author

@xialuxi Maybe you can try to download another trained model which also can initialize your network. And change related settings in mnc_5stage.sh such as NET variable. But I am not sure whether there are more settings we need to do in other files. That is just my thinking.

@hgaiser
Copy link

hgaiser commented Mar 23, 2017

How much memory does your GPU have? If it is less than 8Gb then it won't work.

@xialuxi
Copy link

xialuxi commented Mar 23, 2017

@CassieMai I resize the train_batch_size to 16 in "mnc_config.py",and my machine work.
so "MemoryError" is Memory error by python,not GPU error.

@hgaiser
Copy link

hgaiser commented Mar 23, 2017

MNC does not support minibatches > 1 so I'm not sure what you mean.

@xialuxi
Copy link

xialuxi commented Mar 23, 2017

sorry ,batchesize = 16.

@CassieMai
Copy link
Author

@hgaiser My GPU memory is more than 8Gb.

GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:01:00.0      On |                  N/A |
| 26%   37C    P2    74W / 250W |    427MiB /  6080MiB |      0%      Default |

@xialuxi
Copy link

xialuxi commented Mar 24, 2017

i want to know "__C.TRAIN.IMS_PER_BATCH = 1" mean?
“Given a roidb, construct a minibatch sampled from it”?
"1" represents a complete sample?

@CassieMai
Copy link
Author

@xialuxi Yes, change "train_batch_size=64" to 16 in "mnc_config.py", the Memory error has gone. I don't understand how do you know this Memory error is from python? Thank you!!

@xialuxi
Copy link

xialuxi commented Mar 25, 2017

@hgaiser Could you tell me how to change the num of class?

@DirtyHarryLYL
Copy link

Change the relevant nums in train and test prototxt, the names of classes should be changed either in /lib/datasetspascal_voc_det.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants