Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] On the meaning of "terminated" "truncated" "done" etc. I miss "causality break" flag #194

Closed
1 task done
jamartinh opened this issue Dec 7, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@jamartinh
Copy link
Contributor

Proposal

Arrive to a wide agreement on the meaning of "done", "terminated" and "truncated".

Motivation

With the new API, now we have two boolean variables (terminated, truncated) instead on one, which gives 4 possible situation. However in the Gymnasium docs it is said that reset should be called when terminated=True and also that reset should be called when truncated= true.
So it seems that with respect to resetting the environment, there is no difference, at least in specification and intention.

It also seems that both conditions point to an event which is very significative:

The rupture of the causal chain in the temporal series of the MDP.

This rupture, signaled previously by "done" and now by both "terminated and "truncated" has been used widely in RL libraries to do several things, like processing the data in replay buffer, decide in the programed "collectors" some conditions etc.

However, there are situations where you need to assume a sequence is "complete", i.e, for n-step returns, total accumulated discounted return calculation etc. however, the causal chain does not end as in continuing taks but still you want to communicate that outside so that collectors can save stats and process replay buffer memory but reset should not be done.

Of course, we can add this into the info field, but we can then have "truncated" inside info as was done previously.

So basically, it would be useful to take advantage of the 4 possible meaning on the true,false of these two variables and have a consensus on this so that RL library developers take that into account.

¿What event is now signaling causal chain rupture exclusively without requiring a reset?

Pitch

Arrive to a wide agreement/consensus on the meaning of "done", "terminated" and "truncated". and take into account the causal chain rupture.

Alternatives

add a specific output variable to step indicating causal chain rupture.

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@jamartinh jamartinh added the enhancement New feature or request label Dec 7, 2022
@pseudo-rnd-thoughts
Copy link
Member

Interesting question, Im not sure I totally understand what you mean
Could you look at these page and explain what definition we are missing

@Kallinteris-Andreas
Copy link
Collaborator

This issue has been inactive for over a year and the OP uses the term "causal chain rupture" which I can not understand what it means closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants