Re-add timelimit truncation information #101

araffin · 2022-10-31T18:26:26Z

Description

Partial fix of openai/gym#3102
Migration guide and release notes still need to be updated.

In gym 0.21, there was a key in the info dict that gave the information that truncation was due to timelimit.
This information is lost in gym 0.26 (causing failure in DLR-RM/rl-baselines3-zoo#256 for instance).

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

RedTachyon · 2022-11-01T21:06:40Z

This was a deliberate change in the API so that an important property of the env step is communicated in a robust way, and not through a key in a dictionary. Since 0.26, we've taken the approach of not sticking to bad legacy design decisions made by OpenAI just for the sake of compatibility, so I don't think we should change this here.

I did, however, add a description of the change to the migration guide in #105, since that's definitely necessary

araffin · 2022-11-02T07:32:39Z

I think there is a misunderstanding of the proposed change.

In openai/gym#3102, it is said that terminated=truncated=Trueis allowed because "this approach gives strictly more information".

However, truncated=True can be due to many reasons (lap finished, out of tracking bounds, see Farama-Foundation/gym-docs#128 (comment)), so if truncated=True it doesn't mean that the timelimit was reached, it just mean that some truncation state was reached (and maybe after some termination).

The proposed change doesn't change the API, it just gives strictly more information about which type of truncation.
This can be very helpful when you want to know how often does the agent reaches a timeout (it gives information about the behavior and allow to adjust the timeout).

EDIT: my change could be edited to info["TimeLimit.truncated"] = not truncated (and be before the truncation=True) if we want to tell if the truncation was only due to timelimit

RedTachyon · 2022-11-02T17:29:01Z

However, truncated=True can be due to many reasons (lap finished, out of tracking bounds, see Farama-Foundation/gym-docs#128 (comment))

I disagree with this rather strongly. The idea for truncation is to represent exactly that - reaching a time limit. I don't see why this should be conflated with other reasons for the rollout to env.

The example you mention with finishing the lap is in my opinion just a bug in the current implementation of car racing, see #106.

For the robot example you gave, it might be true that going out of bounds shouldn't be treated as a failure (though I'm not 100% convinced about that), but it's also definitely not the same thing as truncation. Of course, if someone wants to abuse the semantic to simplify their life, this should be possible, but it is not the main intention of the truncated value.

araffin · 2022-11-02T18:11:54Z

but it's also definitely not the same thing as truncation.

I consider timeouts (due to time limit only) to be one type of truncation.

For the robot example you gave, it might be true that going out of bounds shouldn't be treated as a failure (though I'm not 100% convinced about that), but it's also definitely not the same thing as truncation.

It's going out of tracking bounds.
For instance, you want your quadruped to walk as far as possible but you cannot measure outside a given zone.
This is an infinite horizon problem, so you should bootstrap if the tracking bounds are reached.

How would you do that with the current API otherwise?

If you treat the out of bound like a termination and reward it, you may end up with undesirable behaviors (you have an example with hopper in https://arxiv.org/abs/1712.00378).

The example you mention with finishing the lap is in my opinion just a bug

The same goes for the racing car, you don't want your car to be good for one lap only (and crash just after finishing the lap for instance), so rewarding the termination is not a good idea.

EDIT: I already discussed termination/truncation several times in the past with @arjun-kg @pseudo-rnd-thoughts Farama-Foundation/gym-docs#115 (comment) Farama-Foundation/gym-docs#128 and in the google draft of pseudo-rnd-thoughts/gym#1 (comment)
openai/gym#2510 and openai/gym#2752

araffin · 2022-11-02T18:47:16Z

as an appendix, from the docs
"
Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov Decision Process).
"

if you want to treat the cases i describe as termination, you will need to add information to the observation, otherwise you are breaking Markov assumption.

A last collection of examples is everytime a human needs to intervene but that's not the agent's fault. One can think of a misplaced cable, an object that is supposed to be grasped that was moved unintentionnally, a robot that must be stopped because a careless human was on its way, ...

Edit: another common case for truncation is error like losing connection, losing connection to database, network, robot, ...

RedTachyon · 2022-11-12T00:37:17Z

I won't address every individual point, but here's my thought process: in 90% of the cases, you either deal with a normal termination due to reaching a terminal state, so these two scenarios should be privileged and indicated by terminated/truncated, with truncated referring explicitly to hitting the time limit. In weird scenarios like disconnecting from a server or something, users can include that in info, which allows arbitrary extra information for edge cases.

otherwise you are breaking Markov assumption.

This implies that all environments have to be fully observable, which is very incorrect. A POMDP is still markovian even if the observation doesn't include full information. The underlying state is definitely markovian since, well, we're able to implement it in code. Introducing a convention where truncated refers to any termination that's not directly observable would break... pretty much all of existing partially observable RL code as far as I'm aware.

A last collection of examples is everytime a human needs to intervene but that's not the agent's fault.

In that case you'd probably want to include that information explicitly in some way anyways. It's a very non-standard use-case of RL, and right now it is not our intention for gym.Env to fully encompass all potential/future unusual applications of RL - although if it's possible, it's of course better to support something than not support it.

What I think would be justified specifically for CarRacing, which is currently the only built-in environment that has this issue as far as I know, is adding an info entry describing the reason for termination (whether it's crashing or finishing the track, or something like that). Then the overall idea is that:

terminated == true => a terminal state was reached
truncated == true => the time limit was reached
additional reasons/information about termination is provided in info

I think this would allow you to do everything you want?

pseudo-rnd-thoughts · 2022-12-01T12:12:52Z

Closing as while the discussion on the definition of terminated and truncated is important, this appears more for documentation and this is not a feature we are interested in adding at the moment

araffin added 2 commits October 31, 2022 19:17

Re-add timelimit truncation information

dedd72e

Update test

856ec54

araffin mentioned this pull request Nov 2, 2022

Mention truncation in the migration guide #105

Merged

pseudo-rnd-thoughts closed this Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-add timelimit truncation information #101

Re-add timelimit truncation information #101

araffin commented Oct 31, 2022

RedTachyon commented Nov 1, 2022

araffin commented Nov 2, 2022 •

edited

Loading

RedTachyon commented Nov 2, 2022

araffin commented Nov 2, 2022 •

edited

Loading

araffin commented Nov 2, 2022 •

edited

Loading

RedTachyon commented Nov 12, 2022

pseudo-rnd-thoughts commented Dec 1, 2022

Re-add timelimit truncation information #101

Re-add timelimit truncation information #101

Conversation

araffin commented Oct 31, 2022

Description

Type of change

Screenshots

Checklist:

RedTachyon commented Nov 1, 2022

araffin commented Nov 2, 2022 • edited Loading

RedTachyon commented Nov 2, 2022

araffin commented Nov 2, 2022 • edited Loading

araffin commented Nov 2, 2022 • edited Loading

RedTachyon commented Nov 12, 2022

pseudo-rnd-thoughts commented Dec 1, 2022

araffin commented Nov 2, 2022 •

edited

Loading

araffin commented Nov 2, 2022 •

edited

Loading

araffin commented Nov 2, 2022 •

edited

Loading