-
-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for different vector autoreset modes #1227
Add support for different vector autoreset modes #1227
Conversation
@vmoens Have you had a chance to look at this and see if this is compatible with TorchRL? |
That looks good! So checking which behaviour is in place would require checking an auto_reset argument right? To be precise, we don't think that auto reset is a bad idea but that auto reset within step isn't optimal: one should have one method for step, one for reset and another for step and maybe reset with, possibly, a different signature that returns additional info such as the reset observation if needed. |
@vmoens We can't add a whole new function definitions for |
@vmoens Is this compatible with TorchRL? I'm planning on finishing up this PR soon so we can cut a release for you to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really cool thanks!
Exactly what we need!
Awesome! We just need the fells at IssacLab to align to this as well. |
Description
With the change in Gymnasium v1.0, some users have requested support for the other vector autoreset APIs / modes:
info["final_obs"]
with the reset observation passed back to the step'sobs
.We have added support to the built-in
SyncVectorEnv
andAsyncVectorEnv
using theautoreset_mode
argument, which takes a str or Enum ofAutoresetMode
with themetadata["autoreset_mode"]
specifying the implemented API.For custom vector environments, we highly recommend adding this metadata tag to help users and wrappers know the implemented API, as these environments can have any of the autoreset modes implemented.
Importantly, different built-in wrappers have different levels of compatibility; see the table below.
* all inherited wrappers from
VectorizeTransformObservation
are compatible (FilterObservation
,FlattenObservation
,GrayscaleObservation
,ResizeObservation
,ReshapeObservation
,DtypeObservation
).All other reward and action wrappers should be fully compatible.
Why are some wrappers limited?
NormalizeObservation
or wrappers that apply a batch-based transform such asTransformObservation
. This is not possible to implement efficiently. Future PR could investigate adding this.NormalizeObservation
, you would not wish to update the normalizer again for the non-final states. For simple Box space environments, it would be possible to add compatibility through filtering the observations, but for more complex spaces, likeDict
, this is not efficiently possible.