You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using RecurrentPPO in a current project and it strikes me that on L294 the self._last_lstm_states added to the buffer are actually the one from the last terminal state (and not all zeros), when an environment is reset on L252. Is my understanding correct?
If so, would it not be better to check for an episode start already one line before L242 and set the states to zero for those environments instead of handling this in _process_sequence of RecurrentActorCriticPolicyL198 on each forward pass?
Checklist
I have checked that there is no similar issue in the repo
❓ Question
Hi and thanks for the great work!
I am using RecurrentPPO in a current project and it strikes me that on L294 the
self._last_lstm_states
added to the buffer are actually the one from the last terminal state (and not all zeros), when an environment is reset on L252. Is my understanding correct?If so, would it not be better to check for an episode start already one line before L242 and set the states to zero for those environments instead of handling this in
_process_sequence
ofRecurrentActorCriticPolicy
L198 on each forward pass?Checklist
The text was updated successfully, but these errors were encountered: