Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change done field to integers for handling truncation vs termination #2608

Closed
wants to merge 9 commits into from
Closed

Change done field to integers for handling truncation vs termination #2608

wants to merge 9 commits into from

Conversation

XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan commented Feb 11, 2022

Change value range for done from (False, True) to (0, 1, 2). See #2510 (comment) for more details.

Resolves #2510


Old API: done is boolean

Type and Value Range

done: bool
  - False: not done
  - True:  terminated or truncated

Sampling

observation = env.reset()
done = False
while not done:
    action = policy(observation)
    observation, reward, done, info = env.step(action)

Value iteration

V_target = reward + gamma * (1 - done) * V(next_observation)
1 - done  # -> 1  (not done)
1 - done  # -> 0  (terminated or truncated)

This cannot distinguish truncation and termination. The done value for both situation is True.


New API: done is integer in {0, 1, 2}

Type and Value Range

done: int
  - gym.Env.NOT_DONE   (0): not done
  - gym.Env.TERMINATED (1): terminated (internal done in environment)
  - gym.Env.TRUNCATED  (2): truncated  (truncated by wrappers or step counting)

NOTE: It's better to reference constants by name rather than value.

done = env.NOT_DONE  # :)
done = 0  # :(

Sampling

observation = env.reset()
done = False
while not bool(done):
    action = policy(observation)
    observation, reward, done, info = env.step(action)

NOTE: in if expression / while experssion / not expression, the explicit type casting bool(done) can be ommited.

Value iteration

V_target = reward + gamma * abs(1 - done) * V(next_observation)

The only change is:

(1 - done) --> abs(1 - done)

abs(1 - done)  # -> 1  (not done or truncated)
abs(1 - done)  # -> 0  (terminated)

or

(1 - done) --> int(done != env.TERMINATED)

int(done != env.TERMINATED)  # -> 1  (not done or truncated)
int(done != env.TERMINATED)  # -> 0  (terminated)

@@ -648,7 +648,7 @@ def _worker(index, env_fn, pipe, parent_pipe, shared_memory, error_queue):

elif command == "step":
observation, reward, done, info = env.step(data)
if done:
if bool(done):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this actually does anything, the conditional works even with an integer:

if 2:
    print("Hi")
> Hi

An argument might be made for readability, but I don't think casting to bool actually makes it more readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that if done still works is actually a pretty good showcase of the (mostly) backwards-compatibility of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An argument might be made for readability, but I don't think casting to bool actually makes it more readable

Yes, I put this for readability. This explicit type casting is made for "keep in mind that expression is not a boolean in statement bool(expression)".

@jkterry1
Copy link
Collaborator

@XuehaiPan could you please fix the merge conflicts?

@XuehaiPan
Copy link
Contributor Author

Closed because it is preferred to explicitly add two booleans terminated and truncated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Formal API handling of truncation vs termination
3 participants