0.25.0
Release notes
This release finally introduces all new API changes that have been planned for the past year or more, all of which will be turned on by default in a subsequent release. After this point, Gym development should get massively smoother. This release also fixes large bugs present in 0.24.0 and 0.24.1, and we highly discourage using those releases.
API Changes
Step
- A majority of deep reinforcement learning algorithm implementations are incorrect due to an important difference in theory and practice asdone
is not equivalent totermination
. As a result, we have modified thestep
function to return five values,obs, reward, termination, truncation, info
. The full theoretical and practical reason (along with example code changes) for these changes will be explained in a soon-to-be-released blog post. The aim for the change to be backward compatible (for now), for issues, please put report the issue on github or the discord. @arjun-kgRender
- The render API is changed such that the mode has to be specified duringgym.make
with the keywordrender_mode
, after which, the render mode is fixed. For further details see https://younis.dev/blog/2022/render-api/ and #2671. This has the additional changes- with
render_mode="human"
you don't need to call .render(), rendering will happen automatically onenv.step()
- with
render_mode="rgb_array"
,.render()
pops the list of frames rendered since the last.reset()
- with
render_mode="single_rgb_array"
,.render()
returns a single frame, like before.
- with
Space.sample(mask=...)
allows a mask when sampling actions to enable/disable certain actions from being randomly sampled. We recommend developers add this to theinfo
parameter returned byreset(return_info=True)
andstep
. See #2906 for example implementations of the masks or the individual spaces. We have added an example version of this in the taxi environment. @pseudo-rnd-thoughts- Add
Graph
for environments that use graph style observation or action spaces. Currently, the node and edge spaces can only beBox
orDiscrete
spaces. @jjshoots - Add
Text
space for Reinforcement Learning that involves communication between agents and have dynamic length messages (otherwiseMultiDiscrete
can be used). @ryanrudes @pseudo-rnd-thoughts
Bug fixes
- Fixed car racing termination where if the agent finishes the final lap, then the environment ends through truncation not termination. This added a version bump to Car racing to v2 and removed Car racing discrete in favour of
gym.make("CarRacing-v2", continuous=False)
@araffin - In
v0.24.0
,opencv-python
was an accidental requirement for the project. This has been reverted. @KexianShen @pseudo-rnd-thoughts - Updated
utils.play
such that if the environment specifieskeys_to_action
, the function will automatically use that data. @Markus28 - When rendering the blackjack environment, fixed bug where rendering would change the dealers top car. @balisujohn
- Updated mujoco docstring to reflect changes that were accidently overwritten. @Markus28
Misc
- The whole project is partially type hinted using pyright (none of the project files is ignored by the type hinter). @RedTachyon @pseudo-rnd-thoughts (Future work will add strict type hinting to the core API)
- Action masking added to the taxi environment (no version bump due to being backwards compatible) @pseudo-rnd-thoughts
- The
Box
space shape inference is allowshigh
andlow
scalars to be automatically set to(1,)
shape. Minor changes to identifying scalars. @pseudo-rnd-thoughts - Added option support in classic control environment to modify the bounds on the initial random state of the environment @psc-g
- The
RecordVideo
wrapper is becoming deprecated with no support forTextEncoder
with the new render API. The plan is to replaceRecordVideo
with a single function that will receive a list of frames from an environment and automatically render them as a video usingMoviePy
. @johnMinelli - The gym
py.Dockerfile
is optimised from 2Gb to 1.5Gb through a number of optimisations @TheDen