Memory allocation failed #2913

yechaochen · 2016-08-03T11:11:51Z

I have prepared the files:
find_mxnet.py``symbol_mynet.py'train_model.py'train_mynet.py
And I have transform the image and my multi-label to the rec file.And the mean file have generated automatic at the beging of training.
Then an error happend:

[19:03:47] /home/deeper/mxnet/dmlc-core/include/dmlc/./logging.h:235: [19:03:47] src/storage/./pooled_storage_manager.h:62: Memory allocation failed.
Traceback (most recent call last):
  File "train_mynet.py", line 90, in <module>
    train_model.fit(args, net, get_iterator)
  File "/home/deeper/77W-project/net/train_model.py", line 100, in fit
    epoch_end_callback = checkpoint)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/model.py", line 789, in fit
    sym_gen=self.sym_gen)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/model.py", line 192, in _train_multi_device
    logger=logger)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 311, in __init__
    self.slices, train_data)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 224, in __init__
    shared_data_arrays=self.shared_data_arrays[i])
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 145, in _bind_exec
    arg_arr = nd.zeros(arg_shape[i], ctx, dtype=arg_types[i])
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 815, in zeros
    arr = empty(shape, ctx, dtype)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 551, in empty
    return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 69, in _new_alloc_handle
    ctypes.byref(hdl)))
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:03:47] src/storage/./pooled_storage_manager.h:62: Memory allocation failed.

Can anyone tell me what can I do to pull my traning smoothly!

The text was updated successfully, but these errors were encountered:

bertjiazheng · 2016-08-14T13:57:44Z

I got same error.

xiesiyuan · 2016-09-05T16:58:02Z

Yes, I got same error when I run neural art demo. And this demo runs good under CPU mode. And other demo runs good both under CPU mode and GPU mode.

After reviewing the src code, I think this error is thrown during processing the function MXExecutorBindEX in C_api.cc.

However, I have not find any solution to fix it. Can anyone help?

xiesiyuan · 2016-09-06T04:12:37Z

sloved, this error is actually equal to the OOM(out of memory) failure. change the arg --max-long-edge when use python command to resize the source jpg as a smaller one. Problem solved.

RuidongLee · 2016-11-27T02:54:05Z

I got same error when run fast rcnn example, is this because gpu have not enough memory?

yechaochen closed this as completed Aug 3, 2016

ijkguo mentioned this issue Jan 18, 2017

rcnn example issue collection #4713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory allocation failed #2913

Memory allocation failed #2913

yechaochen commented Aug 3, 2016

bertjiazheng commented Aug 14, 2016

xiesiyuan commented Sep 5, 2016

xiesiyuan commented Sep 6, 2016

RuidongLee commented Nov 27, 2016

Memory allocation failed #2913

Memory allocation failed #2913

Comments

yechaochen commented Aug 3, 2016

bertjiazheng commented Aug 14, 2016

xiesiyuan commented Sep 5, 2016

xiesiyuan commented Sep 6, 2016

RuidongLee commented Nov 27, 2016