Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Memory allocation failed #2913

Closed
yechaochen opened this issue Aug 3, 2016 · 4 comments
Closed

Memory allocation failed #2913

yechaochen opened this issue Aug 3, 2016 · 4 comments

Comments

@yechaochen
Copy link

I have prepared the files:
find_mxnet.py``symbol_mynet.py'train_model.py'train_mynet.py
And I have transform the image and my multi-label to the rec file.And the mean file have generated automatic at the beging of training.
Then an error happend:

[19:03:47] /home/deeper/mxnet/dmlc-core/include/dmlc/./logging.h:235: [19:03:47] src/storage/./pooled_storage_manager.h:62: Memory allocation failed.
Traceback (most recent call last):
  File "train_mynet.py", line 90, in <module>
    train_model.fit(args, net, get_iterator)
  File "/home/deeper/77W-project/net/train_model.py", line 100, in fit
    epoch_end_callback = checkpoint)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/model.py", line 789, in fit
    sym_gen=self.sym_gen)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/model.py", line 192, in _train_multi_device
    logger=logger)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 311, in __init__
    self.slices, train_data)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 224, in __init__
    shared_data_arrays=self.shared_data_arrays[i])
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/executor_manager.py", line 145, in _bind_exec
    arg_arr = nd.zeros(arg_shape[i], ctx, dtype=arg_types[i])
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 815, in zeros
    arr = empty(shape, ctx, dtype)
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 551, in empty
    return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 69, in _new_alloc_handle
    ctypes.byref(hdl)))
  File "/home/deeper/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:03:47] src/storage/./pooled_storage_manager.h:62: Memory allocation failed.

Can anyone tell me what can I do to pull my traning smoothly!

@bertjiazheng
Copy link

I got same error.

@xiesiyuan
Copy link

Yes, I got same error when I run neural art demo. And this demo runs good under CPU mode. And other demo runs good both under CPU mode and GPU mode.

After reviewing the src code, I think this error is thrown during processing the function MXExecutorBindEX in C_api.cc.

However, I have not find any solution to fix it. Can anyone help?

@xiesiyuan
Copy link

sloved, this error is actually equal to the OOM(out of memory) failure. change the arg --max-long-edge when use python command to resize the source jpg as a smaller one. Problem solved.

@RuidongLee
Copy link

I got same error when run fast rcnn example, is this because gpu have not enough memory?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants