-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gym 0.26 support #780
Add Gym 0.26 support #780
Conversation
it seems that the failures come from |
Only a subset of the failures, those are fixed now. |
@RedTachyon I'm currently investigating a bug which may be related to openai/gym#2422. Here's a stack trace of the type of errors:
I've been digging into the code and the problem seems to be that this line def sample(self) -> int:
return self.start + self.np_random.randint(self.n) is being called with If it's any help, this is only happening after loading a model, so the problem may be related to it (I haven't checked much on that front yet, wanted to rule out something obvious with the changes on seeding first). Here's an example test were it fails stable-baselines3/tests/test_save_load.py Line 216 in 58a9806
It seems |
I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator. I think I see two main ways to solve it - either in |
I see, thanks for checking. I wonder what this means for SB3 adopting Gym 0.22, since the problem seems to be on Gym's side. I'm guessing the fix (whenever it comes) won't be available for SB3 until the next release by Gym? |
openai/gym#2640 must be fixed first too |
We're going to do a 0.22.1 release in the new future that will fix all of these things, some of the relevant fixes here have already been merged |
Update on this:
Not sure exactly how gym's updates are causing this lower mean reward, need to check this further
Caused because the trajectory truncation is not raising a warning in these lines: stable-baselines3/tests/test_her.py Lines 258 to 261 in cdaa9ab
EDIT: CI also shows about 3 tests failing with |
Regarding the "mean reward below threshold" problem, after some further investigations the root cause is the change in seeding behaviour in gym. SB3 sets seeds here:
Before gym 0.22.0, the default The consequence of it is, for instance, that the result of
is not consistent between gym==0.21.0 and gym >= 0.22.0. The solution I say makes the most sense is to simply adjust the threshold value in the test. |
I see...
is there any way to make it consistent?
The whole point of those performance tests is to detect any change that may injure performance/change results. (they fail easily as soon as any minor change is made) So we should not change the threshold but rather fix the underlying change/issue, or in the current case, I would prefer changing the seed if we cannot have consistent behavior with previous version. btw, are all those warnings (see below) due to gym only?
same for
? |
Similarly, the HER test failure was also caused by the change in RNG. The failing test required that after training for 200 steps, the env had to be left in the middle of a trajectory. After the change in gym, the RNG gods decided that after 200 steps the env had just finished an episode, i.e.,
==> warning is never raised ==> test assertion fails. A simple change in the seed fixes the test. |
Closing this one as I just released an alpha version of the Gymnasium branch on PyPi:
Documentation is available here: https://stable-baselines3.readthedocs.io/en/feat-gymnasium-support/ |
@araffin if I understand correctly, SB3 directly jumped from supporting |
SB3 2.x (master version) supports gymnasium 0.28.1 and gym 0.21/0.26 via shimmy |
@araffin is there any documentation/reference on how to use SB3 with gym 0.21/0.26 via shimmy? I saw the install requirements of master branch and it only mentioned gymnasium (does not include gym). |
It is automatic if you use stable-baselines3/stable_baselines3/common/env_util.py Lines 99 to 101 in fd0cd82
Patch env is defined here: and yes gym is an optional dependency. |
EDIT: Please use the Gymnasium branch instead of this one: #1327
(the new branch is compatible with gym 0.21, gym 0.26 and gymnasium)
See comment #780 (comment) to use this PR
To install SB3 with gym 0.26+ support:
(or as a requirement:
git+https://github.com/carlosluis/stable-baselines3@fix_tests#egg=stable_baselines3[extra,tests,docs]
and"sb3_contrib @ git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib@feat/new-gym-version"
in asetup.py
see DLR-RM/rl-baselines3-zoo#256)Note: if you want to use gymnasium, you can do:
before any stable-baselines3 import.
for native gymnasium support, you can take a look at #1327 or install it with:
Description
Gym 0.26 has been released and with it breaking changes. The objective of this PR is to fix all the failing tests.
Moving from Gym 0.26.2 to gymanisum 0.26.2 (which are the same) is part of this PR.
Missing:
push docker images(note: waiting for Add check when registering gym compatibility envs to avoid gymnasium warning Farama-Foundation/Shimmy#29 and proper decoupling with gymnasium)
Motivation and Context
Gym release notes
Types of changes
closes #840 #871
closes #271
closes #1156
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Status