Use Monitor episode reward/length for `evaluate_policy` #220

Miffyli · 2020-11-12T22:51:17Z

Description

Design choice: Checking for Monitor wrapper reliably for both envs and vecenvs got tricky and messy, so instead I opted for "lazily" checking if the "episode" information is available in info and assume it is from a Monitor wrapper.

Motivation and Context

closes #181

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

araffin

Some points we need to discuss ;)

docs/misc/changelog.rst

stable_baselines3/common/evaluation.py

…ports

Miffyli · 2020-11-13T23:58:17Z

CI will fail because I could not get typing stuff to work out (e.g. Monitor's). There also starts being a ton of circular imports which are suuuuuper-fun to deal with. I think making whole type_aliases.py file only "active" during typing would help in future, if that is possible in any way.

@araffin Could you look into the typing things if the current things are alright and fix where necessary?

araffin · 2020-11-14T17:50:17Z

here also starts being a ton of circular imports which are suuuuuper-fun to deal with.

No worry, I will take a look ;) (i had some fun with that in the past)

araffin · 2020-11-14T18:50:32Z

@Miffyli done ;)

but there some warnings not catched now...
one solution would be to add an argument like for the env checker (warn=True)

=============================== warnings summary ===============================
tests/test_callbacks.py: 6 warnings
tests/test_identity.py: 12 warnings
tests/test_spaces.py: 6 warnings
tests/test_utils.py: 3 warnings
tests/test_vec_normalize.py: 3 warnings
  /builds/araffin/stable-baselines3/stable_baselines3/common/evaluation.py:66: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
    UserWarning,

le only "active" during typing would help in future, if that is possible in any way.

the issue is then the automatic documentation which would fail (yes it is a mess ^^#)

Miffyli · 2020-11-15T14:14:58Z

Thanks a ton! Now things should be in order :)

araffin

LGTM =)

Miffyli added 4 commits November 13, 2020 00:21

Update evaluate_policy to use monitor data if available

f24752e

Update documentation

acb6016

Cleaning up

02533ba

Remove unnecessary typing trickery

c2c9770

Miffyli requested a review from araffin November 12, 2020 23:04

araffin reviewed Nov 13, 2020

View reviewed changes

docs/misc/changelog.rst Outdated Show resolved Hide resolved

stable_baselines3/common/evaluation.py Outdated Show resolved Hide resolved

stable_baselines3/common/evaluation.py Outdated Show resolved Hide resolved

Miffyli added 8 commits November 13, 2020 23:41

Update doc

e000c64

Rename is_wrapped to clarify it is for vecenvs

3ff8561

Add is_wrapped for regular envs

5a88907

Add is_wrapped call for subprocvecenv and update code for circular im…

cbbbe73

…ports

Move new functions back to env_util and fix imports

3abfb7d

Update changelog

600d36e

Clarify evaluate_policy docs

891b024

Add tests for wrapped modifying episode lengths

465b3f0

araffin added 3 commits November 14, 2020 19:33

Fix tests

f95dbde

Update changelog

133927f

Minor edits

a2712d7

Add warn switch to evaluate_policy and update tests

c901297

Merge branch 'master' into fix/eval

bf65af6

araffin mentioned this pull request Nov 16, 2020

[Bug] Step environment that needs reset #224

Closed

araffin approved these changes Nov 16, 2020

View reviewed changes

araffin merged commit 18d10db into master Nov 16, 2020

araffin deleted the fix/eval branch November 16, 2020 10:52

araffin mentioned this pull request Nov 19, 2020

[Bug] Use of gym.make() stops "rollout/" data from being printed #232

Closed

Miffyli mentioned this pull request Jan 11, 2021

Match performance with stable-baselines (discrete case) #110

Merged

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Monitor episode reward/length for `evaluate_policy` #220

Use Monitor episode reward/length for `evaluate_policy` #220

Miffyli commented Nov 12, 2020

araffin left a comment •

edited

Loading

Miffyli commented Nov 13, 2020

araffin commented Nov 14, 2020

araffin commented Nov 14, 2020

Miffyli commented Nov 15, 2020

araffin left a comment

Use Monitor episode reward/length for evaluate_policy #220

Use Monitor episode reward/length for evaluate_policy #220

Conversation

Miffyli commented Nov 12, 2020

Description

Motivation and Context

Types of changes

Checklist:

araffin left a comment • edited Loading

Choose a reason for hiding this comment

Miffyli commented Nov 13, 2020

araffin commented Nov 14, 2020

araffin commented Nov 14, 2020

Miffyli commented Nov 15, 2020

araffin left a comment

Choose a reason for hiding this comment

Use Monitor episode reward/length for `evaluate_policy` #220

Use Monitor episode reward/length for `evaluate_policy` #220

araffin left a comment •

edited

Loading