Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow PPO to turn off advantage normalization #61

Merged
merged 7 commits into from
Feb 23, 2022

Conversation

vwxyzjn
Copy link
Contributor

@vwxyzjn vwxyzjn commented Feb 22, 2022

Description

Allow PPO to turn of advantage normalization. Follow up from DLR-RM/stable-baselines3#763, #60

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have included an example of using the feature (required for new features).
  • I have included baseline results (required for new training algorithms or training-related features).
  • I have updated the documentation accordingly.
  • I have updated the changelog accordingly (required).
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the changelog too, otherwise LGTM ;) (and you need an additional issue if you have already one opened in SB3 repo)


@pytest.mark.parametrize("normalize_advantage", [False, True])
def test_advantage_normalization(model_class, normalize_advantage):
model = MaskablePPO("MlpPolicy", "CartPole-v1", n_steps=64, normalize_advantage=normalize_advantage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it works with CartPole, let's see...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey you may need to approve the workflow run ;)

Copy link
Member

@araffin araffin Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did and there are some failures already ;)
but best is to quickly run locally using the -k argument

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Feb 22, 2022

Had the following error, seems to relate to DLR-RM/stable-baselines3#782

ImportError: cannot import name 'GoalEnv' from 'gym' (/home/costa/.cache/pypoetry/virtualenvs/cleanrl-ghSZGHE3-py3.9/lib/python3.9/site-packages/gym/__init__.py)

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Feb 22, 2022

Ok should be good now.

@araffin araffin merged commit f5c1aaa into Stable-Baselines-Team:master Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants