Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceExhaustedError #12

Open
OversightAI opened this issue Mar 2, 2019 · 3 comments
Open

ResourceExhaustedError #12

OversightAI opened this issue Mar 2, 2019 · 3 comments

Comments

@OversightAI
Copy link

Is there a workaround for the ResourceExhaustedError?

That's what happen when I run main.py with a custom env:

Traceback (most recent call last):
  File "main.py", line 125, in <module>
    main()
  File "main.py", line 103, in main
    stats = algo.train(env, args, summary_writer)
  File "[...]\Deep-RL-Keras\A2C\a2c.py", line 100, in train
    self.train_models(states, actions, rewards, done)
  File "[...]\Deep-RL-Keras\A2C\a2c.py", line 67, in train_models
    self.c_opt([states, discounted_rewards])
  File "[...]\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "[...]\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "[...]\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
    run_metadata_ptr)
  File "[...]\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[177581,177581] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
         [[{{node sub_17}} = Sub[T=DT_FLOAT, _class=["loc:@gradients_1/sub_17_grad/Reshape_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_2_0_1, dense_6/BiasAdd)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
@germain-hug
Copy link
Owner

Hi,
I am not familiar with this error, but it does seem like you are dealing with very large tensors ([177581,177581]), have you tried narrowing down where this tensor comes from? Also playing with the batch-size and input size should help.

@OversightAI
Copy link
Author

Hi,
thanks for the response. What do you mean with input size? I tried to use a lower batch-size but the error still orccurs. I will have a closer look at the project later on.

@germain-hug
Copy link
Owner

Hi,
Apologies about the late reply! It seems like the size of your environment / state is very large and causes the network to produce some very large tensors at some point. You could try checking where this large tensor comes from, and optionally changing some network parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants