-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] TimeLimit wrapper logic has changed #3102
Comments
This was an intentional change that we should have mentioned in the release notes more. I could see us modify this assumption in the |
I would disagree on the semantic of it. See comment by @XuehaiPan in #2510 (comment) The current TimeLimit implementation seems also to be inconsistent with https://github.com/openai/gym/blob/master/gym/utils/step_api_compatibility.py EDIT: I agree terminated and truncated can be true, but here we know that termination is not due to truncation, so I think it should be False |
I think a comment has gone missing in the compatibility file which explains I think the issue As the old API only allowed 3 states, not allowing ( Is there a separate issue to the backward compatibility? |
who is "we"? You seem to disagree with @vwxyzjn (please read #2510 (comment)) who disagrees with @XuehaiPan (see #2510 (comment)) This API change was meant to makes things clearer... For me a clear option (that would also match previous behavior) would be:
In the case here of the time limit, if an episode is already terminated, then you should not set |
For the old API, the variable
Now, in the new API,
I think For API conversion: old new
done = terminated or truncated
truncated = not terminated and truncated |
The "we" is myself, @RedTachyon and @arjun-kg who discussed this on discord.
I agree this was the aim, but it seems that clarifying the meaning of the variables has caused confusion that I suspect already existed but did not have a place to show itself.
This seems to confuse the meanings of As @XuehaiPan notes above, I don't believe this a bug. It does break backward compatibility and requires training libraries from correctly only using |
My perspective is just that by allowing |
truncation is not only due to timelimit, see below:
I would disagree with that.
but the truncation did not occur... so truncation should be False. |
You are right that internal and external to the environment reasoning is incorrect, I think internal or external to the agent's observation would be more correct. Thinking about this more, I understand the disagreement to be if the termination and truncation states are logically / causally linked. My understanding is that you think that if when a state transitions, it reaches a termination state and the episode ends, therefore never reaches the truncation state making An alternative interpretation is that the state transitions to a new state and @araffin Is there any issue with the second interpretation? Or why do you favour your interpretation? |
exactly
One reason is consistency, gym has behaved like that in the last 5 years and that would prevent surprises (this would break SB3 code silently for instance). Another reason is the semantic of the timelimit wrapper.
If you reach termination before the time limit, then none of 1), 2), 3) or 4) need a TimeLimit wrapper, you don't need to know if the timeout may have been reached (and timeout is an artificial way of terminating an episode) and that would also be confusing in that case ( Note that I'm fine with |
This is true but the limitation was solely due to the implementation forced only termination or truncation to be true. With our new API, this limitation no longer exists so I don't think we should limit ourselves just because the old API did. Additionally, how is this causing issues with SB3? Training algorithms should be ignoring the truncated variable beyond stepping through an environment, therefore, should not cause issues in the training stage. Could you explain why you are happy with
The first option is just the old API and I still don't understand limiting the states for good theoretical or practical reasons. |
A software with more features doesn't mean better software. We should aim for the most intuitive/user friendly API, especially as gym is used by practitioners with various level of experience, and avoid silent logic change with the previous API (old API doesn't mean bad API). Gym is not new, having subtle changes like that will just bring confusion. I also don't understand where you want to use that additional information? (as it won't be useful for resetting the agent, nor for training it). To answer the rest of your questions, let's talk about the intuitive API, in my opinion, there are two possibles intuitive use of terminated and truncated: option 1 With this option, This is the most intuitive explanation already used with the new API in some codes, for instance: option 2 With this option, With what you propose, |
@araffin As posted in #3102 (comment), the correct conversion is: done = terminated or truncated
info["TimeLimit.truncated"] = not terminated and truncated |
This is the current implementation in the compat method yes, but:
[1] most importantly, the argument that it is done that way because we get more information is weak if this information is not used (and for debug, there is the info dict) |
Sorry, I forgot about this thread.
The problem is partially language I think because in SB3, if to run bootstrapping is called An additional thought is that all training libraries should be compatible with dealing with this case ( |
let me try to rephrase...
the doc is consistent with the code but there is no explicit migration guide and mention of the breaking change.
here, and that's mainly what this issue is about, I disagree with the logic.
I also think it's partially language but mostly logic for me (for instance saying "timelimit was reached" instead of "episode ends because of timelimit" would partially solve this issue).
this goes directly against of what you propose, no? EDIT: another instance that shows the current allowed states is not the most intuitive: https://github.com/sail-sg/envpool/pull/205/files#diff-ce18b2383ca873c94323ec7719ea561774f6e64c044e7874b2f47f3746351deaR89 (terminated and truncated are exclusive) |
@pseudo-rnd-thoughts gym 0.26 has been released for almost two months already and this I'm also still curious to hear why the current implementation is the most intuitive? (Despite counter-examples (code written with gym 0.26):
(sidenote: I also cannot find the doc I quoted before anymore) EDIT: for backward compatiblity and also to give more info, |
Can we be adults and skip the passive aggressive comments? This decision was discussed a bunch before we made it, and a clever comeback is like the least likely thing to change it. For me ultimately the argument is that this approach gives strictly more information, with the downside being that it requires library maintainers to actually maintain their libraries. Keeping an old design randomly made by OpenAI just to keep things the same is a terrible idea in the long run. And of course this should have been explicitly stated in the docs, I'm sure making a PR would have taken less time than this thread, so if you want to contribute that, the PRs are always open, otherwise I can get to it when I sit down with a computer tomorrow or so. |
Describe the bug
Before gym 0.26:
gym/gym/wrappers/time_limit.py
Lines 51 to 56 in 3498617
After:
gym/gym/wrappers/time_limit.py
Lines 53 to 54 in 54b406b
It should be
truncated = truncated or not terminated
to match previous behavior.test was also updated in 54b406b#diff-15620e7669a4869759c87e2b6a4f51ad87609ed2680a54047e2da2ff7fcc4149
If the change is intended, then it should also be mentioned in the release.
The text was updated successfully, but these errors were encountered: