[Optimization] Replay Buffers shouldn't use copy when using np.array #112

m-rph · 2020-07-17T10:25:21Z

Related to #49

In

stable-baselines3/stable_baselines3/common/buffers.py

Lines 198 to 208 in 23afedb

    
           def add(self, obs: np.ndarray, next_obs: np.ndarray, action: np.ndarray, reward: np.ndarray, done: np.ndarray) -> None: 
        
               # Copy to avoid modification by reference 
        
               self.observations[self.pos] = np.array(obs).copy() 
        
               if self.optimize_memory_usage: 
        
                   self.observations[(self.pos + 1) % self.buffer_size] = np.array(next_obs).copy() 
        
               else: 
        
                   self.next_observations[self.pos] = np.array(next_obs).copy() 
        
               self.actions[self.pos] = np.array(action).copy() 
        
               self.rewards[self.pos] = np.array(reward).copy() 
        
               self.dones[self.pos] = np.array(done).copy()

stable-baselines3/stable_baselines3/common/buffers.py

Lines 346 to 349 in 23afedb

    
           self.observations[self.pos] = np.array(obs).copy() 
        
           self.actions[self.pos] = np.array(action).copy() 
        
           self.rewards[self.pos] = np.array(reward).copy() 
        
           self.dones[self.pos] = np.array(done).copy()

We call np.ndarray(x).copy(). This is unnecessary because np.array has the argument "copy" which is True by default.

https://numpy.org/doc/stable/reference/generated/numpy.array.html

import numpy as np
x = [1,2,3,4,5]
x1 = np.array(x)
x is x1
# False
x2 = np.array(x1)
x2 is x1
# False

The text was updated successfully, but these errors were encountered:

Miffyli · 2020-07-17T10:31:04Z

Good catch. Mind if I include this in a PR with credits to you (#110)? If you find bunch more such quirks and optimizations, feel free to bundle them up to a PR though :)

m-rph · 2020-07-17T10:33:32Z

I can make a large one that deals with code optimisation/quirks and leave #110 for algorithm performance, it's up to you :)

Miffyli · 2020-07-17T10:35:14Z

Ok, lets make another PR for optimizations like this 👍 . Lets wait until agent performance has been matched so things do not crisscross to badly.

m-rph · 2020-07-17T11:32:41Z

After some more investigation, I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located, so

x[indice] = np.array(data)

Copies the data twice, once from data to the temporary, and once more from the temporary to x.

Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!).

https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays

araffin · 2020-07-17T19:20:27Z

I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located

We need to be extra careful with that one, I remember having weird issues that apparently came from changes by reference (hence the extra conservative copy that is currently in the code).

But if everything is copied, then perfect.

m-rph · 2020-07-17T19:41:24Z

But if everything is copied, then perfect.

I played a bit with it in a local implementation and there seemed to be some performance degradation with respect to speed when I removed the second copy so I suggest that we investigate both scenarios when the time comes.

jarlva · 2020-08-02T05:51:46Z

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

araffin · 2020-08-03T08:32:22Z

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

Why would pytorch tensors be faster than numpy?
The replay buffer was originally with pytorch tensors for storage but then it required several transformations from numpy <-> pytorch (e.g. for normalization) whereas the current implementation only do one conversion.

m-rph · 2020-08-03T17:31:24Z

Using pytorch tensors doesn't necessarily improve performance mainly due to the need to perform the computations @araffin mentioned, moreover, going cpu->gpu->cpu will probably degrade performance due to latency transfers. The data is usually rather small and fit in the cache so the transfers will probably dominate the runtime.

jarlva · 2020-08-04T06:18:46Z

Pardon, it was a general suggestion, not related specifically to this issue.

m-rph changed the title ~~Replay Buffers shouldn't use copy when using np.array~~ [Optimization] Replay Buffers shouldn't use copy when using np.array Jul 17, 2020

Miffyli added the enhancement New feature or request label Jul 17, 2020

m-rph mentioned this issue Jul 25, 2020

SAC implementation is 2x slower than in stable-baselines #122

Closed

araffin mentioned this issue Sep 27, 2023

Fix type annotations of buffers #1700

Merged

16 tasks

araffin closed this as completed in #1700 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

m-rph commented Jul 17, 2020 •

edited

Loading

Miffyli commented Jul 17, 2020 •

edited

Loading

m-rph commented Jul 17, 2020

Miffyli commented Jul 17, 2020

m-rph commented Jul 17, 2020 •

edited

Loading

araffin commented Jul 17, 2020

m-rph commented Jul 17, 2020 via email •

edited

Loading

jarlva commented Aug 2, 2020

araffin commented Aug 3, 2020

m-rph commented Aug 3, 2020

jarlva commented Aug 4, 2020

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

[Optimization] Replay Buffers shouldn't use copy when using np.array #112

Comments

m-rph commented Jul 17, 2020 • edited Loading

Miffyli commented Jul 17, 2020 • edited Loading

m-rph commented Jul 17, 2020

Miffyli commented Jul 17, 2020

m-rph commented Jul 17, 2020 • edited Loading

araffin commented Jul 17, 2020

m-rph commented Jul 17, 2020 via email • edited Loading

jarlva commented Aug 2, 2020

araffin commented Aug 3, 2020

m-rph commented Aug 3, 2020

jarlva commented Aug 4, 2020

m-rph commented Jul 17, 2020 •

edited

Loading

Miffyli commented Jul 17, 2020 •

edited

Loading

m-rph commented Jul 17, 2020 •

edited

Loading

m-rph commented Jul 17, 2020 via email •

edited

Loading