VOC2012 training failure, Memory error #49

CassieMai · 2017-03-22T16:38:29Z

Hello, I have another problem when running ./experiments/scripts/mnc_5stage.sh 0 VGG16, I don't find out why it happens. Thank people who will concern this issue!

I0322 10:32:43.313863 29651 net.cpp:270] This network produces output seg_cls_loss_ext
I0322 10:32:43.400629 29651 net.cpp:283] Network initialization done.
I0322 10:32:43.400925 29651 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0322 10:32:43.722609 29651 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0322 10:32:43.722632 29651 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0322 10:32:43.722635 29651 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0322 10:32:43.781419 29651 net.cpp:810] Ignoring source layer drop6
I0322 10:32:43.791087 29651 net.cpp:810] Ignoring source layer drop7
I0322 10:32:43.849704 29651 net.cpp:810] Ignoring source layer drop6_mask
I0322 10:32:43.859282 29651 net.cpp:810] Ignoring source layer drop7_mask
Solving...
/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False)
/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_box = scaled_gt_boxes[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask_info = mask_info[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  top_mask_info[i, 0] = gt_assignment[val]
/MNC/tools/../lib/pylayer/mask_layer.py:75: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[info[0]][0:info[1], 0:info[2]]
/MNC/tools/../lib/pylayer/stage_bridge_layer.py:224: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
Traceback (most recent call last):
  File "./tools/train_net.py", line 96, in <module>
    _solver.train_model(args.max_iters)
  File "/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 127, in train_model
    self.solver.step(1)
MemoryError

The text was updated successfully, but these errors were encountered:

xialuxi · 2017-03-23T00:38:48Z

i have the same problem, Because my machine can't keep up for VGG16.

xialuxi · 2017-03-23T00:41:26Z

I want to know how to use other network to train?

CassieMai · 2017-03-23T04:05:45Z

@xialuxi Maybe you can try to download another trained model which also can initialize your network. And change related settings in mnc_5stage.sh such as NET variable. But I am not sure whether there are more settings we need to do in other files. That is just my thinking.

hgaiser · 2017-03-23T08:32:56Z

How much memory does your GPU have? If it is less than 8Gb then it won't work.

xialuxi · 2017-03-23T08:53:25Z

@CassieMai I resize the train_batch_size to 16 in "mnc_config.py",and my machine work.
so "MemoryError" is Memory error by python,not GPU error.

hgaiser · 2017-03-23T09:03:23Z

MNC does not support minibatches > 1 so I'm not sure what you mean.

xialuxi · 2017-03-23T09:19:01Z

sorry ,batchesize = 16.

CassieMai · 2017-03-23T16:10:18Z

@hgaiser My GPU memory is more than 8Gb.

GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:01:00.0      On |                  N/A |
| 26%   37C    P2    74W / 250W |    427MiB /  6080MiB |      0%      Default |

xialuxi · 2017-03-24T00:44:03Z

i want to know "__C.TRAIN.IMS_PER_BATCH = 1" mean?
“Given a roidb, construct a minibatch sampled from it”?
"1" represents a complete sample?

CassieMai · 2017-03-24T20:36:09Z

@xialuxi Yes, change "train_batch_size=64" to 16 in "mnc_config.py", the Memory error has gone. I don't understand how do you know this Memory error is from python? Thank you!!

xialuxi · 2017-03-25T06:44:37Z

@hgaiser Could you tell me how to change the num of class?

DirtyHarryLYL · 2017-10-21T14:17:06Z

Change the relevant nums in train and test prototxt, the names of classes should be changed either in /lib/datasetspascal_voc_det.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VOC2012 training failure, Memory error #49

VOC2012 training failure, Memory error #49

CassieMai commented Mar 22, 2017

xialuxi commented Mar 23, 2017

xialuxi commented Mar 23, 2017

CassieMai commented Mar 23, 2017

hgaiser commented Mar 23, 2017

xialuxi commented Mar 23, 2017 •

edited

Loading

hgaiser commented Mar 23, 2017

xialuxi commented Mar 23, 2017

CassieMai commented Mar 23, 2017

xialuxi commented Mar 24, 2017

CassieMai commented Mar 24, 2017

xialuxi commented Mar 25, 2017

DirtyHarryLYL commented Oct 21, 2017

VOC2012 training failure, Memory error #49

VOC2012 training failure, Memory error #49

Comments

CassieMai commented Mar 22, 2017

xialuxi commented Mar 23, 2017

xialuxi commented Mar 23, 2017

CassieMai commented Mar 23, 2017

hgaiser commented Mar 23, 2017

xialuxi commented Mar 23, 2017 • edited Loading

hgaiser commented Mar 23, 2017

xialuxi commented Mar 23, 2017

CassieMai commented Mar 23, 2017

xialuxi commented Mar 24, 2017

CassieMai commented Mar 24, 2017

xialuxi commented Mar 25, 2017

DirtyHarryLYL commented Oct 21, 2017

xialuxi commented Mar 23, 2017 •

edited

Loading