Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Step environment that needs reset #224

Closed
Yingdong-Hu opened this issue Nov 16, 2020 · 7 comments
Closed

[Bug] Step environment that needs reset #224

Yingdong-Hu opened this issue Nov 16, 2020 · 7 comments
Labels
bug Something isn't working more information needed Please fill the issue template completely

Comments

@Yingdong-Hu
Copy link

Yingdong-Hu commented Nov 16, 2020

🐛 Step environment that needs reset

I train DQN on Pong, and I want to use this trained agent to collect 3000 episodes. Each episode contains 60 timesteps. Every time I start a new episode, I use env.reset(). My code is like this.

env = make_atari_env('PongNoFrameskip-v4', n_envs=1, seed=args.seed)
env = VecFrameStack(env, n_stack=4)
agent = DQN.load(model_path)
episode_count = 3000

for i in range(episode_count):
    state = env.reset()
    steps = 0
    while True:
        action, state = model.predict(obs, state=state, deterministic=deterministic)
        obs, _, done, infos = env.step(action)

        # add action and obs to buffer

        steps += 1
        if steps == 60:
            break

When I ran the program for a period of time and collected around 1000 episodes, the program suddenly reported an error like this. It's really confusing, it looks like the env cannot be reset.

Traceback (most recent call last):
  File "/hyd/keypoints/my_file/env_groundtruth_rl.py", line 188, in <module>
    state = env.reset()
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_frame_stack.py", line 87, in reset
    obs: np.ndarray = self.venv.reset()  # pytype:disable=annotation-type-mismatch
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 61, in reset
    obs = self.envs[env_idx].reset()
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 237, in reset
    return self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 277, in reset
    return self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 264, in reset
    observation = self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 58, in reset
    obs, _, done, _ = self.env.step(1)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 80, in step
    obs, reward, done, info = self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 135, in step
    obs, reward, done, info = self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 234, in step
    return self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/monitor.py", line 96, in step
    raise RuntimeError("Tried to step environment that needs reset")
RuntimeError: Tried to step environment that needs reset
@Yingdong-Hu Yingdong-Hu added the bug Something isn't working label Nov 16, 2020
@araffin araffin added the more information needed Please fill the issue template completely label Nov 16, 2020
@araffin
Copy link
Member

araffin commented Nov 16, 2020

Hello,

The provided code is incomplete and seems wrong.
Please provide a full minimal example and take a look at the doc on how to run the trained agent:

obs = env.reset()
n_episodes = 3000
current_episode = 0
while current_episode < n_episodes:
    action, _ = agent.predict(obs)
    obs, reward, done, info = env.step(action)
    # No need to reset, env is resetted automatically
    if done[0]:
        current_episode += 1

We also provide a evaluate_policy helper that may do the job for you ;) (it will be updated to work with Atari soon in #220 )

@Yingdong-Hu
Copy link
Author

If I want to limit each episode to include 60 timesteps. Will this be a problem?

env = make_atari_env('PongNoFrameskip-v4', n_envs=1, seed=args.seed)
env = VecFrameStack(env, n_stack=4)
agent = DQN.load(model_path)
episode_count = 3000

for i in range(episode_count):
    state = env.reset()
    steps = 0
    while True:
        action, state = model.predict(obs, state=state, deterministic=deterministic)
        obs, _, done, infos = env.step(action)

        # add action and obs to buffer

        steps += 1
        if steps == 60:
            break

@araffin
Copy link
Member

araffin commented Nov 16, 2020

If I want to limit each episode to include 60 timesteps. Will this be a problem?

What will be a problem?
This will be a problem if the episode length is less than 60 timesteps, it will throw the error you describe above.

Please take a closer look at the code I provided ;) (it is a bit hard to follow the logic if your snippet)

In you current code it should be if steps == 60 or done[0] to avoid the error.

@Yingdong-Hu
Copy link
Author

Thank you very much, I've got it now.

@araffin
Copy link
Member

araffin commented Nov 16, 2020

If the issue is fixed, then you can close this one ;)

@longfeizhang617
Copy link

Would you please tell me how to fix the problem? thanks

@DLR-RM DLR-RM deleted a comment from longfeizhang617 Dec 28, 2021
@Miffyli
Copy link
Collaborator

Miffyli commented Dec 28, 2021

@longfeizhang617 You better open up a new issue (we do not know what is wrong in your case). However, go through documentation and examples carefully before opening the issue. Note that we do not offer tech support for custom environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working more information needed Please fill the issue template completely
Projects
None yet
Development

No branches or pull requests

4 participants