Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Monitor episode reward/length for evaluate_policy #220

Merged
merged 17 commits into from
Nov 16, 2020
Merged

Conversation

Miffyli
Copy link
Collaborator

@Miffyli Miffyli commented Nov 12, 2020

Description

Design choice: Checking for Monitor wrapper reliably for both envs and vecenvs got tricky and messy, so instead I opted for "lazily" checking if the "episode" information is available in info and assume it is from a Monitor wrapper.

Motivation and Context

closes #181

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

@Miffyli Miffyli requested a review from araffin November 12, 2020 23:04
Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points we need to discuss ;)

docs/misc/changelog.rst Outdated Show resolved Hide resolved
stable_baselines3/common/evaluation.py Outdated Show resolved Hide resolved
stable_baselines3/common/evaluation.py Outdated Show resolved Hide resolved
@Miffyli
Copy link
Collaborator Author

Miffyli commented Nov 13, 2020

CI will fail because I could not get typing stuff to work out (e.g. Monitor's). There also starts being a ton of circular imports which are suuuuuper-fun to deal with. I think making whole type_aliases.py file only "active" during typing would help in future, if that is possible in any way.

@araffin Could you look into the typing things if the current things are alright and fix where necessary?

@araffin
Copy link
Member

araffin commented Nov 14, 2020

here also starts being a ton of circular imports which are suuuuuper-fun to deal with.

No worry, I will take a look ;) (i had some fun with that in the past)

@araffin
Copy link
Member

araffin commented Nov 14, 2020

@Miffyli done ;)

but there some warnings not catched now...
one solution would be to add an argument like for the env checker (warn=True)

=============================== warnings summary ===============================
tests/test_callbacks.py: 6 warnings
tests/test_identity.py: 12 warnings
tests/test_spaces.py: 6 warnings
tests/test_utils.py: 3 warnings
tests/test_vec_normalize.py: 3 warnings
  /builds/araffin/stable-baselines3/stable_baselines3/common/evaluation.py:66: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
    UserWarning,

le only "active" during typing would help in future, if that is possible in any way.

the issue is then the automatic documentation which would fail (yes it is a mess ^^#)

@Miffyli
Copy link
Collaborator Author

Miffyli commented Nov 15, 2020

Thanks a ton! Now things should be in order :)

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

evaluate_policy reports preprocessed reward, whereas rollout/ep_rew_mean is unprocessed
2 participants