Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add timelimit truncation information #101

Closed
wants to merge 2 commits into from

Conversation

araffin
Copy link
Contributor

@araffin araffin commented Oct 31, 2022

Description

Partial fix of openai/gym#3102
Migration guide and release notes still need to be updated.

In gym 0.21, there was a key in the info dict that gave the information that truncation was due to timelimit.
This information is lost in gym 0.26 (causing failure in DLR-RM/rl-baselines3-zoo#256 for instance).

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@RedTachyon
Copy link
Member

This was a deliberate change in the API so that an important property of the env step is communicated in a robust way, and not through a key in a dictionary. Since 0.26, we've taken the approach of not sticking to bad legacy design decisions made by OpenAI just for the sake of compatibility, so I don't think we should change this here.

I did, however, add a description of the change to the migration guide in #105, since that's definitely necessary

@araffin
Copy link
Contributor Author

araffin commented Nov 2, 2022

I think there is a misunderstanding of the proposed change.

In openai/gym#3102, it is said that terminated=truncated=Trueis allowed because "this approach gives strictly more information".

However, truncated=True can be due to many reasons (lap finished, out of tracking bounds, see Farama-Foundation/gym-docs#128 (comment)), so if truncated=True it doesn't mean that the timelimit was reached, it just mean that some truncation state was reached (and maybe after some termination).

The proposed change doesn't change the API, it just gives strictly more information about which type of truncation.
This can be very helpful when you want to know how often does the agent reaches a timeout (it gives information about the behavior and allow to adjust the timeout).

EDIT: my change could be edited to info["TimeLimit.truncated"] = not truncated (and be before the truncation=True) if we want to tell if the truncation was only due to timelimit

@RedTachyon
Copy link
Member

However, truncated=True can be due to many reasons (lap finished, out of tracking bounds, see Farama-Foundation/gym-docs#128 (comment))

I disagree with this rather strongly. The idea for truncation is to represent exactly that - reaching a time limit. I don't see why this should be conflated with other reasons for the rollout to env.

The example you mention with finishing the lap is in my opinion just a bug in the current implementation of car racing, see #106.

For the robot example you gave, it might be true that going out of bounds shouldn't be treated as a failure (though I'm not 100% convinced about that), but it's also definitely not the same thing as truncation. Of course, if someone wants to abuse the semantic to simplify their life, this should be possible, but it is not the main intention of the truncated value.

@araffin
Copy link
Contributor Author

araffin commented Nov 2, 2022

but it's also definitely not the same thing as truncation.

I consider timeouts (due to time limit only) to be one type of truncation.

For the robot example you gave, it might be true that going out of bounds shouldn't be treated as a failure (though I'm not 100% convinced about that), but it's also definitely not the same thing as truncation.

It's going out of tracking bounds.
For instance, you want your quadruped to walk as far as possible but you cannot measure outside a given zone.
This is an infinite horizon problem, so you should bootstrap if the tracking bounds are reached.

How would you do that with the current API otherwise?

If you treat the out of bound like a termination and reward it, you may end up with undesirable behaviors (you have an example with hopper in https://arxiv.org/abs/1712.00378).

The example you mention with finishing the lap is in my opinion just a bug

The same goes for the racing car, you don't want your car to be good for one lap only (and crash just after finishing the lap for instance), so rewarding the termination is not a good idea.

EDIT: I already discussed termination/truncation several times in the past with @arjun-kg @pseudo-rnd-thoughts Farama-Foundation/gym-docs#115 (comment) Farama-Foundation/gym-docs#128 and in the google draft of pseudo-rnd-thoughts/gym#1 (comment)
openai/gym#2510 and openai/gym#2752

@araffin
Copy link
Contributor Author

araffin commented Nov 2, 2022

as an appendix, from the docs
"
Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov Decision Process).
"

if you want to treat the cases i describe as termination, you will need to add information to the observation, otherwise you are breaking Markov assumption.

A last collection of examples is everytime a human needs to intervene but that's not the agent's fault. One can think of a misplaced cable, an object that is supposed to be grasped that was moved unintentionnally, a robot that must be stopped because a careless human was on its way, ...

Edit: another common case for truncation is error like losing connection, losing connection to database, network, robot, ...

@RedTachyon
Copy link
Member

I won't address every individual point, but here's my thought process: in 90% of the cases, you either deal with a normal termination due to reaching a terminal state, so these two scenarios should be privileged and indicated by terminated/truncated, with truncated referring explicitly to hitting the time limit. In weird scenarios like disconnecting from a server or something, users can include that in info, which allows arbitrary extra information for edge cases.

otherwise you are breaking Markov assumption.

This implies that all environments have to be fully observable, which is very incorrect. A POMDP is still markovian even if the observation doesn't include full information. The underlying state is definitely markovian since, well, we're able to implement it in code. Introducing a convention where truncated refers to any termination that's not directly observable would break... pretty much all of existing partially observable RL code as far as I'm aware.

A last collection of examples is everytime a human needs to intervene but that's not the agent's fault.

In that case you'd probably want to include that information explicitly in some way anyways. It's a very non-standard use-case of RL, and right now it is not our intention for gym.Env to fully encompass all potential/future unusual applications of RL - although if it's possible, it's of course better to support something than not support it.

What I think would be justified specifically for CarRacing, which is currently the only built-in environment that has this issue as far as I know, is adding an info entry describing the reason for termination (whether it's crashing or finishing the track, or something like that). Then the overall idea is that:

  • terminated == true => a terminal state was reached
  • truncated == true => the time limit was reached
  • additional reasons/information about termination is provided in info

I think this would allow you to do everything you want?

@pseudo-rnd-thoughts
Copy link
Member

Closing as while the discussion on the definition of terminated and truncated is important, this appears more for documentation and this is not a feature we are interested in adding at the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants