Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to newest Gym version #572

Closed
wants to merge 21 commits into from
Closed

Update to newest Gym version #572

wants to merge 21 commits into from

Conversation

jkterry1
Copy link
Contributor

Description

This should fix CI for Gym 0.20.0 wrt Atari ROMs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@Miffyli
Copy link
Collaborator

Miffyli commented Sep 16, 2021

Looks like something is not working out right. Possible thing from Gym 0.20?

    def _check_obs(obs: Union[tuple, dict, np.ndarray, int], observation_space: spaces.Space, method_name: str) -> None:
        """
        Check that the observation returned by the environment
        correspond to the declared one.
        """
        if not isinstance(observation_space, spaces.Tuple):
            assert not isinstance(
                obs, tuple
            ), f"The observation returned by the `{method_name}()` method should be a single value, not a tuple"
    
        # The check for a GoalEnv is done by the base class
        if isinstance(observation_space, spaces.Discrete):
            assert isinstance(obs, int), f"The observation returned by `{method_name}()` method must be an int"
        elif _is_numpy_array_space(observation_space):
            assert isinstance(obs, np.ndarray), f"The observation returned by `{method_name}()` method must be a numpy array"
    
>       assert observation_space.contains(
            obs
        ), f"The observation returned by the `{method_name}()` method does not match the given observation space"
E       AssertionError: The observation returned by the `reset()` method does not match the given observation space

@araffin
Copy link
Member

araffin commented Sep 16, 2021

Looks like something is not working out right. Possible thing from Gym 0.20?

yep, probably comes from wrong dtype, I fixed several issues yesterday: cyprienc@e7a48d2

@araffin araffin mentioned this pull request Sep 16, 2021
3 tasks
@araffin
Copy link
Member

araffin commented Sep 17, 2021

We also need to fix that issue: #573 (comment)
Either upstream or we will have to monkey patch gym in SB3... (we did that in the past, I'm hoping we won't have to do it again)

@jkterry1 jkterry1 changed the title Fix Atari in CI for newest Gym version Update to newest Gym version Oct 21, 2021
@jkterry1
Copy link
Contributor Author

So to fix the error with breakout, the new command equivalent to the noframeskip is gym.make('ALE/Breakout-v5', frameskip=1). I don't know how you want to handle that in make_atari_env?

@araffin
Copy link
Member

araffin commented Oct 24, 2021

the new command equivalent to the noframeskip is gym.make('ALE/Breakout-v5', frameskip=1).

wait, what is the new default behavior? and why? and frameskip=1 sounds wrong... (I would expect frameskip=0)

@JesseFarebro
Copy link

@araffin there's no longer any environment suffixes like NoFrameskip or Deterministic. You'll now have to supply all additional options when constructing the environment.

Regarding the frameskip setting, it's always been a misnomer. It's really "how many frames is the action applied for" NOT "how many frames are skipped after the action was executed". A frameskip of 1 means the action is applied for one frame. See: https://github.com/mgbellemare/Arcade-Learning-Environment/blob/79ffb7d5c8d404bdde951d45bef263b1a1f84125/src/gym/envs/atari/environment.py#L222-L223 and https://github.com/mgbellemare/Arcade-Learning-Environment/blob/79ffb7d5c8d404bdde951d45bef263b1a1f84125/src/environment/stella_environment.cpp#L157

Hope that helps clear things up.

@araffin
Copy link
Member

araffin commented Oct 25, 2021

Thanks for the answer =)

It's really "how many frames is the action applied for"

I see, yes a bit confusing name and we actually do the same in SB3 max and skip wrapper, I've seen the name "action repeat" instead too, which better illustrate its action.

. You'll now have to supply all additional options when constructing the environment.

but what is the default option then? and what was the default option before for the v4 envs? (with NoFrameskip name)

@araffin araffin mentioned this pull request Nov 24, 2021
@araffin araffin mentioned this pull request Dec 6, 2021
14 tasks
@araffin
Copy link
Member

araffin commented Dec 14, 2021

but what is the default option then? and what was the default option before for the v4 envs? (with NoFrameskip name)

@JesseFarebro looking at https://brosa.ca/blog/ale-release-v0.7, I assume that frameskip=5 and action set is "full" with 0.25 sticky probability.
But looking at the paper of Machado et al, it seems that the value frameskip=5 and the choice of full action set is pretty arbitrary, do you know how it impacts performance/reported results?

@araffin
Copy link
Member

araffin commented Dec 14, 2021

Side note: if we want to upgrade to gym 0.21, not only code, test and documentation must be updated but also all our notebook tutorials...

EDIT: I forgot the RL Zoo too: https://github.com/DLR-RM/rl-baselines3-zoo

@JesseFarebro
Copy link

@araffin there was some miscommunication on my part (and some false assumptions) which lead to a frameskip of 5 being the default for v5 environments. Specifically, a frameskip of 5 was used in the Revisiting the Arcade Learning Environment paper due to their comparison with linear function approximation with hand-crafted features. In the pre-DQN literature, a value of 5 was typically used with LFA. A frameskip for 4 has been the standard value for work post-DQN and I will be updating the v5 default to be 4. The deployment of v5 environments isn't ubiquitous so I feel like making this silent change will have minimal impact and won't require bumping to v6.

Sorry for the confusion, I hope this clears things up.

@araffin
Copy link
Member

araffin commented Dec 14, 2021

thanks for the clarification, is it the same for the action set?

@JesseFarebro
Copy link

@araffin sorry for the delay, I was still trying to sort this all out. It makes sense to use the minimal action set to stay as close to the post-DQN methodology as possible.

@araffin
Copy link
Member

araffin commented Dec 21, 2021

@araffin sorry for the delay, I was still trying to sort this all out. It makes sense to use the minimal action set to stay as close to the post-DQN methodology as possible.

ok, then at that point, it would maybe make sense to either keep v4 (and not drop them in the future at it was intended) or provide a version with fixes (if I recall, v5 fixed some determinism issues, right? and made the max number of steps consistent?) that would allow to replicate published results?

@JesseFarebro
Copy link

JesseFarebro commented Dec 21, 2021

@araffin There are still differences which we'd like to fix so the bump to v5 is warranted. Right now things will look like (after v0.7.4 of the ALE):

Version Frame Skip Sticky Action Prob. Action Set Max Frames
v0 [2, 4] 0.25 Minimal ≤ 40,000
v4 [2, 4] 0.0 Minimal ≤ 400,000
v5 4 0.25 Minimal =108,000

The v5 settings should have been the standardized methodology since the revisiting paper. I don't have any intentions of dropping v0 or v4for the foreseeable future until there's a better way to ensure this won't break existing projects. As of right now, the next Gym release (i.e., 0.22) will actively complain if users aren't using the latest version of the environment. Hopefully, this will help with consensus.

@modanesh modanesh mentioned this pull request Dec 28, 2021
10 tasks
@jkterry1 jkterry1 closed this Dec 30, 2021
@cash
Copy link

cash commented Jan 13, 2022

I see that this PR has been closed. Is there any plan to support gym 0.20 or greater out of the box? (I saw the comment on #674 to install without dependencies to avoid the gym version conflict).

@araffin
Copy link
Member

araffin commented Jan 13, 2022

I see that this PR has been closed. Is there any plan to support gym 0.20 or greater out of the box? (I saw the comment on #674 to install without dependencies to avoid the gym version conflict).

This PR has been closed in favor of #705, help is also welcomed to update the doc #705 (comment)

We may also wait for openai/gym#2531 to be fixed too.

@cash
Copy link

cash commented Jan 13, 2022

Thanks. Will take a look at #705. Looks like openai/gym#2531 has been fixed so maybe wait for a 0.22 release of gym?

@jkterry1
Copy link
Contributor Author

@araffin openai/gym#2531 doesn't impact any current release and we aren't releasing the next version until it's fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants