Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Use of gym.make() stops "rollout/" data from being printed #232

Closed
pstansell opened this issue Nov 19, 2020 · 4 comments · Fixed by #237
Closed

[Bug] Use of gym.make() stops "rollout/" data from being printed #232

pstansell opened this issue Nov 19, 2020 · 4 comments · Fixed by #237
Labels
enhancement New feature or request RTFM Answer is the documentation

Comments

@pstansell
Copy link

pstansell commented Nov 19, 2020

🐛 Bug

If gym.make is used to define the environment the rollout data is not printed.

To Reproduce

Example where gym.make() is used:

import gym
from stable_baselines3 import SAC
env = gym.make('Pendulum-v0')
model = SAC('MlpPolicy', env, verbose = 1)
model.learn(200, log_interval = 1)

Output is missing the rollout/ data:

Using cpu device
Wrapping the env in a DummyVecEnv.
---------------------------------
| time/              |          |
|    episodes        | 1        |
|    fps             | 71       |
|    time_elapsed    | 2        |
|    total timesteps | 200      |
| train/             |          |
|    actor_loss      | 7.2      |
|    critic_loss     | 2.08     |
|    ent_coef        | 0.971    |
|    ent_coef_loss   | -0.0491  |
|    learning_rate   | 0.0003   |
|    n_updates       | 99       |
---------------------------------

Expected behavior

Example where gym.make() is not used:

import gym
from stable_baselines3 import SAC
model = SAC('MlpPolicy', 'Pendulum-v0', verbose = 1)
model.learn(200, log_interval = 1)

Output includes the rollout/ data:

Using cpu device
Creating environment from the given name 'Pendulum-v0'
Wrapping the env in a DummyVecEnv.
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.33e+03 |
| time/              |           |
|    episodes        | 1         |
|    fps             | 57        |
|    time_elapsed    | 3         |
|    total timesteps | 200       |
| train/             |           |
|    actor_loss      | 7.34      |
|    critic_loss     | 1.02      |
|    ent_coef        | 0.971     |
|    ent_coef_loss   | -0.0488   |
|    learning_rate   | 0.0003    |
|    n_updates       | 99        |
----------------------------------

System Info

Describe the characteristic of your environment:

  • pip install stable-baselines3
  • Python 3.6.8
  • PyTorch version 1.7.0
  • Gym version 0.17.3
@pstansell pstansell added the bug Something isn't working label Nov 19, 2020
@araffin araffin added RTFM Answer is the documentation and removed bug Something isn't working labels Nov 19, 2020
@araffin
Copy link
Member

araffin commented Nov 19, 2020

Hello,

This is not a bug, you need to wrap your environment using a Monitor wrapper.
See Documentation and hill-a/stable-baselines#339

@pstansell
Copy link
Author

pstansell commented Nov 19, 2020

Thank you very much for your quick reply. I'm sorry I missed the need for the monitor wrapper.

The example at hill-a/stable-baselines#24 it very useful to show how it is applied.

My example above does what I want if I use:

import gym
from stable_baselines3 import SAC
from stable_baselines3.common.monitor import Monitor
env = gym.make('Pendulum-v0')
env = Monitor(env)
model = SAC('MlpPolicy', env, verbose = 1)
model.learn(200, log_interval = 1)

The output is now:

Using cpu device
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -973     |
| time/              |          |
|    episodes        | 1        |
|    fps             | 169      |
|    time_elapsed    | 1        |
|    total timesteps | 200      |
| train/             |          |
|    actor_loss      | 6.08     |
|    critic_loss     | 2.26     |
|    ent_coef        | 0.971    |
|    ent_coef_loss   | -0.049   |
|    learning_rate   | 0.0003   |
|    n_updates       | 99       |
---------------------------------

It appears that a number of people have tripped up on the same thing. If there had been a message along the lines of

Wrapping the env in a Monitor.

I would probably have worked it out myself and not taken your time by submitting this issue as a bug.

@araffin araffin added the enhancement New feature or request label Nov 19, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Nov 19, 2020

It appears that a number of people have tripped up on the same thing. If there had there been a message along the lines of
Wrapping the env in a Monitor.

I think this is a good suggestion that could be included: Monitor is indeed a bit of a quirk but heavily depended on by SB3, so any clarity of its use would be nice to see.

@araffin
Copy link
Member

araffin commented Nov 19, 2020

It appears that a number of people have tripped up on the same thing. If there had there been a message along the lines of

fair enough, I think we can in fact wrap it automatically, since we have is_wrapped helper since #220 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RTFM Answer is the documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants