Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

Closed
m-rph opened this issue Jul 17, 2020 · 10 comments · Fixed by #1700
Closed

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

m-rph opened this issue Jul 17, 2020 · 10 comments · Fixed by #1700
Labels
enhancement New feature or request

Comments

@m-rph
Copy link
Contributor

m-rph commented Jul 17, 2020

Related to #49

In

def add(self, obs: np.ndarray, next_obs: np.ndarray, action: np.ndarray, reward: np.ndarray, done: np.ndarray) -> None:
# Copy to avoid modification by reference
self.observations[self.pos] = np.array(obs).copy()
if self.optimize_memory_usage:
self.observations[(self.pos + 1) % self.buffer_size] = np.array(next_obs).copy()
else:
self.next_observations[self.pos] = np.array(next_obs).copy()
self.actions[self.pos] = np.array(action).copy()
self.rewards[self.pos] = np.array(reward).copy()
self.dones[self.pos] = np.array(done).copy()

self.observations[self.pos] = np.array(obs).copy()
self.actions[self.pos] = np.array(action).copy()
self.rewards[self.pos] = np.array(reward).copy()
self.dones[self.pos] = np.array(done).copy()

We call np.ndarray(x).copy(). This is unnecessary because np.array has the argument "copy" which is True by default.

https://numpy.org/doc/stable/reference/generated/numpy.array.html

import numpy as np
x = [1,2,3,4,5]
x1 = np.array(x)
x is x1
# False
x2 = np.array(x1)
x2 is x1
# False
@m-rph m-rph changed the title Replay Buffers shouldn't use copy when using np.array [Optimization] Replay Buffers shouldn't use copy when using np.array Jul 17, 2020
@Miffyli Miffyli added the enhancement New feature or request label Jul 17, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Jul 17, 2020

Good catch. Mind if I include this in a PR with credits to you (#110)? If you find bunch more such quirks and optimizations, feel free to bundle them up to a PR though :)

@m-rph
Copy link
Contributor Author

m-rph commented Jul 17, 2020

I can make a large one that deals with code optimisation/quirks and leave #110 for algorithm performance, it's up to you :)

@Miffyli
Copy link
Collaborator

Miffyli commented Jul 17, 2020

Ok, lets make another PR for optimizations like this 👍 . Lets wait until agent performance has been matched so things do not crisscross to badly.

@m-rph
Copy link
Contributor Author

m-rph commented Jul 17, 2020

After some more investigation, I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located, so

x[indice] = np.array(data)

Copies the data twice, once from data to the temporary, and once more from the temporary to x.

Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!).

https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays

@araffin
Copy link
Member

araffin commented Jul 17, 2020

I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located

We need to be extra careful with that one, I remember having weird issues that apparently came from changes by reference (hence the extra conservative copy that is currently in the code).

But if everything is copied, then perfect.

@m-rph
Copy link
Contributor Author

m-rph commented Jul 17, 2020 via email

@jarlva
Copy link

jarlva commented Aug 2, 2020

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

@araffin
Copy link
Member

araffin commented Aug 3, 2020

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

Why would pytorch tensors be faster than numpy?
The replay buffer was originally with pytorch tensors for storage but then it required several transformations from numpy <-> pytorch (e.g. for normalization) whereas the current implementation only do one conversion.

@m-rph
Copy link
Contributor Author

m-rph commented Aug 3, 2020

Using pytorch tensors doesn't necessarily improve performance mainly due to the need to perform the computations @araffin mentioned, moreover, going cpu->gpu->cpu will probably degrade performance due to latency transfers. The data is usually rather small and fit in the cache so the transfers will probably dominate the runtime.

@jarlva
Copy link

jarlva commented Aug 4, 2020

Pardon, it was a general suggestion, not related specifically to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants