-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MXNetError:cudaMalloc failed: out of memory #257
Comments
perhaps you can adjust the batch size, decreasing to 64 and it may fix the problem |
@jackytu256 thank you, i have tried to , but it does no help |
May I know how many GPU memory you have as well as which one of algos you are trying to train? |
decrease your batch size until it can run successfully. |
+-----------------------------------------------------------------------------+ |
@nttstar |
I saw otherwhere someone tried to use monger to solve the memory issue, that might be a choice, but I haven't try. Just FYI. |
I had the same issue. Decreasing the batch size fixed the problem |
Still i get same error, I decreased batch size to 2 from 32. I don't think this is the solve of problem. |
I follow your steps,but i meet this problem ,can anybody give me some solutions.
Traceback (most recent call last):
File "train_softmax.py", line 485, in
main()
File "train_softmax.py", line 482, in main
train_net(args)
File "train_softmax.py", line 476, in train_net
epoch_end_callback = epoch_cb )
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/base_module.py", line 512, in fit
self.update()
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 651, in update
self._kvstore, self._exec_group.param_names)
File "/usr/local/lib/python2.7/dist-packages/mxnet/model.py", line 134, in _update_params_on_kvstore
kvstore.push(name, grad_list, priority=-index)
File "/usr/local/lib/python2.7/dist-packages/mxnet/kvstore.py", line 232, in push
self.handle, mx_uint(len(ckeys)), ckeys, cvals, ctypes.c_int(priority)))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [11:41:11] src/storage/./pooled_storage_manager.h:108: cudaMalloc failed: out of memory
The text was updated successfully, but these errors were encountered: