You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the PPO implementation and how it handles the difference between episodes that ended because it was terminated (it completed the task) or truncated (it ran out of time).
A comment in the advantage calculation suggests that episodes that are not done are to be bootstrapped from the value function.
At the same time, both truncations and terminations are or'd together so both cases are counted as the same type of done:
Is the difference here that you assume that we're operating in environments with an actual episode timeout so that truncations mean failure? In other cases, there is no inherent sense of time-limit, only a designer desire for faster task solving, in which I think it makes sense to handle truncations separately.
Have I understood all of this correctly?
The text was updated successfully, but these errors were encountered:
Hi, thank you so much for the CleanRL resource!
I have a question regarding the PPO implementation and how it handles the difference between episodes that ended because it was terminated (it completed the task) or truncated (it ran out of time).
A comment in the advantage calculation suggests that episodes that are not done are to be bootstrapped from the value function.
At the same time, both truncations and terminations are
or
'd together so both cases are counted as the same type of done:cleanrl/cleanrl/ppo_continuous_action.py
Line 221 in 8cbca61
This seems to go against other findings/implementations: Time Limits in Reinforcement Learning, StableBaselines3.
Is the difference here that you assume that we're operating in environments with an actual episode timeout so that truncations mean failure? In other cases, there is no inherent sense of time-limit, only a designer desire for faster task solving, in which I think it makes sense to handle truncations separately.
Have I understood all of this correctly?
The text was updated successfully, but these errors were encountered: