[Proposal] On the meaning of "terminated" "truncated" "done" etc. I miss "causality break" flag #194
Closed
1 task done
Labels
enhancement
New feature or request
Proposal
Arrive to a wide agreement on the meaning of "done", "terminated" and "truncated".
Motivation
With the new API, now we have two boolean variables (terminated, truncated) instead on one, which gives 4 possible situation. However in the Gymnasium docs it is said that reset should be called when terminated=True and also that reset should be called when truncated= true.
So it seems that with respect to resetting the environment, there is no difference, at least in specification and intention.
It also seems that both conditions point to an event which is very significative:
The rupture of the causal chain in the temporal series of the MDP.
This rupture, signaled previously by "done" and now by both "terminated and "truncated" has been used widely in RL libraries to do several things, like processing the data in replay buffer, decide in the programed "collectors" some conditions etc.
However, there are situations where you need to assume a sequence is "complete", i.e, for n-step returns, total accumulated discounted return calculation etc. however, the causal chain does not end as in continuing taks but still you want to communicate that outside so that collectors can save stats and process replay buffer memory but reset should not be done.
Of course, we can add this into the info field, but we can then have "truncated" inside info as was done previously.
So basically, it would be useful to take advantage of the 4 possible meaning on the true,false of these two variables and have a consensus on this so that RL library developers take that into account.
¿What event is now signaling causal chain rupture exclusively without requiring a reset?
Pitch
Arrive to a wide agreement/consensus on the meaning of "done", "terminated" and "truncated". and take into account the causal chain rupture.
Alternatives
add a specific output variable to step indicating causal chain rupture.
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: