-
-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-add timelimit truncation information #101
Conversation
This was a deliberate change in the API so that an important property of the env step is communicated in a robust way, and not through a key in a dictionary. Since 0.26, we've taken the approach of not sticking to bad legacy design decisions made by OpenAI just for the sake of compatibility, so I don't think we should change this here. I did, however, add a description of the change to the migration guide in #105, since that's definitely necessary |
I think there is a misunderstanding of the proposed change. In openai/gym#3102, it is said that However, The proposed change doesn't change the API, it just gives strictly more information about which type of truncation. EDIT: my change could be edited to |
I disagree with this rather strongly. The idea for truncation is to represent exactly that - reaching a time limit. I don't see why this should be conflated with other reasons for the rollout to env. The example you mention with finishing the lap is in my opinion just a bug in the current implementation of car racing, see #106. For the robot example you gave, it might be true that going out of bounds shouldn't be treated as a failure (though I'm not 100% convinced about that), but it's also definitely not the same thing as truncation. Of course, if someone wants to abuse the semantic to simplify their life, this should be possible, but it is not the main intention of the |
I consider timeouts (due to time limit only) to be one type of truncation.
It's going out of tracking bounds. How would you do that with the current API otherwise? If you treat the out of bound like a termination and reward it, you may end up with undesirable behaviors (you have an example with hopper in https://arxiv.org/abs/1712.00378).
The same goes for the racing car, you don't want your car to be good for one lap only (and crash just after finishing the lap for instance), so rewarding the termination is not a good idea. EDIT: I already discussed termination/truncation several times in the past with @arjun-kg @pseudo-rnd-thoughts Farama-Foundation/gym-docs#115 (comment) Farama-Foundation/gym-docs#128 and in the google draft of pseudo-rnd-thoughts/gym#1 (comment) |
as an appendix, from the docs if you want to treat the cases i describe as termination, you will need to add information to the observation, otherwise you are breaking Markov assumption. A last collection of examples is everytime a human needs to intervene but that's not the agent's fault. One can think of a misplaced cable, an object that is supposed to be grasped that was moved unintentionnally, a robot that must be stopped because a careless human was on its way, ... Edit: another common case for truncation is error like losing connection, losing connection to database, network, robot, ... |
I won't address every individual point, but here's my thought process: in 90% of the cases, you either deal with a normal termination due to reaching a terminal state, so these two scenarios should be privileged and indicated by terminated/truncated, with truncated referring explicitly to hitting the time limit. In weird scenarios like disconnecting from a server or something, users can include that in
This implies that all environments have to be fully observable, which is very incorrect. A POMDP is still markovian even if the observation doesn't include full information. The underlying state is definitely markovian since, well, we're able to implement it in code. Introducing a convention where
In that case you'd probably want to include that information explicitly in some way anyways. It's a very non-standard use-case of RL, and right now it is not our intention for What I think would be justified specifically for CarRacing, which is currently the only built-in environment that has this issue as far as I know, is adding an
I think this would allow you to do everything you want? |
Closing as while the discussion on the definition of terminated and truncated is important, this appears more for documentation and this is not a feature we are interested in adding at the moment |
Description
Partial fix of openai/gym#3102
Migration guide and release notes still need to be updated.
In gym 0.21, there was a key in the info dict that gave the information that truncation was due to timelimit.
This information is lost in gym 0.26 (causing failure in DLR-RM/rl-baselines3-zoo#256 for instance).
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)