[Question] RecurrentPPO: Reset LSTM states early? #239

phisad · 2024-04-05T13:07:04Z

❓ Question

Hi and thanks for the great work!

I am using RecurrentPPO in a current project and it strikes me that on L294 the self._last_lstm_states added to the buffer are actually the one from the last terminal state (and not all zeros), when an environment is reset on L252. Is my understanding correct?

If so, would it not be better to check for an episode start already one line before L242 and set the states to zero for those environments instead of handling this in _process_sequence of RecurrentActorCriticPolicy L198 on each forward pass?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2024-04-05T13:55:08Z

Hello,
that's a good suggestion =)
Would you mind giving it a try and check that you obtain the exact same results?
If so, please open a PR ;)

That would simplify and make things much faster hopefully.

phisad · 2024-04-11T14:36:43Z

Alright, thanks for the confirmation. ^^

I'll give a try and make sure that these tests run through https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/tests/test_lstm.py without any errors (and maybe even a bit faster).

araffin · 2024-04-11T14:40:31Z

Thinking again about that issue, I'm afraid we still need

stable-baselines3-contrib/sb3_contrib/common/recurrent/policies.py

Lines 203 to 204 in 25b4326

    
           (1.0 - episode_start).view(1, n_seq, 1) * lstm_states[0], 
        
           (1.0 - episode_start).view(1, n_seq, 1) * lstm_states[1],

to reset states manually when starting a new episode? (at least when updating the network, when calling train())

or can we pass all hidden states to PyTorch?

phisad added the question Further information is requested label Apr 5, 2024

araffin added the enhancement New feature or request label Apr 5, 2024

araffin mentioned this issue Aug 28, 2024

[Feature Request] Recurrent policies araffin/sbx#40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] RecurrentPPO: Reset LSTM states early? #239

[Question] RecurrentPPO: Reset LSTM states early? #239

phisad commented Apr 5, 2024

araffin commented Apr 5, 2024

phisad commented Apr 11, 2024

araffin commented Apr 11, 2024 •

edited

Loading

[Question] RecurrentPPO: Reset LSTM states early? #239

[Question] RecurrentPPO: Reset LSTM states early? #239

Comments

phisad commented Apr 5, 2024

❓ Question

Checklist

araffin commented Apr 5, 2024

phisad commented Apr 11, 2024

araffin commented Apr 11, 2024 • edited Loading

araffin commented Apr 11, 2024 •

edited

Loading