[Bug] Step environment that needs reset #224

Yingdong-Hu · 2020-11-16T09:26:57Z

🐛 Step environment that needs reset

I train DQN on Pong, and I want to use this trained agent to collect 3000 episodes. Each episode contains 60 timesteps. Every time I start a new episode, I use env.reset(). My code is like this.

env = make_atari_env('PongNoFrameskip-v4', n_envs=1, seed=args.seed)
env = VecFrameStack(env, n_stack=4)
agent = DQN.load(model_path)
episode_count = 3000

for i in range(episode_count)：
    state = env.reset()
    steps = 0
    while True:
        action, state = model.predict(obs, state=state, deterministic=deterministic)
        obs, _, done, infos = env.step(action)

        # add action and obs to buffer

        steps += 1
        if steps == 60:
            break

When I ran the program for a period of time and collected around 1000 episodes, the program suddenly reported an error like this. It's really confusing, it looks like the env cannot be reset.

Traceback (most recent call last):
  File "/hyd/keypoints/my_file/env_groundtruth_rl.py", line 188, in <module>
    state = env.reset()
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_frame_stack.py", line 87, in reset
    obs: np.ndarray = self.venv.reset()  # pytype:disable=annotation-type-mismatch
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 61, in reset
    obs = self.envs[env_idx].reset()
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 237, in reset
    return self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 277, in reset
    return self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 264, in reset
    observation = self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 58, in reset
    obs, _, done, _ = self.env.step(1)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 80, in step
    obs, reward, done, info = self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/atari_wrappers.py", line 135, in step
    obs, reward, done, info = self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/gym/core.py", line 234, in step
    return self.env.step(action)
  File "/opt/conda/lib/python3.8/site-packages/stable_baselines3/common/monitor.py", line 96, in step
    raise RuntimeError("Tried to step environment that needs reset")
RuntimeError: Tried to step environment that needs reset

The text was updated successfully, but these errors were encountered:

araffin · 2020-11-16T09:45:38Z

Hello,

The provided code is incomplete and seems wrong.
Please provide a full minimal example and take a look at the doc on how to run the trained agent:

obs = env.reset()
n_episodes = 3000
current_episode = 0
while current_episode < n_episodes:
    action, _ = agent.predict(obs)
    obs, reward, done, info = env.step(action)
    # No need to reset, env is resetted automatically
    if done[0]:
        current_episode += 1

We also provide a evaluate_policy helper that may do the job for you ;) (it will be updated to work with Atari soon in #220 )

Yingdong-Hu · 2020-11-16T09:53:56Z

If I want to limit each episode to include 60 timesteps. Will this be a problem?

env = make_atari_env('PongNoFrameskip-v4', n_envs=1, seed=args.seed)
env = VecFrameStack(env, n_stack=4)
agent = DQN.load(model_path)
episode_count = 3000

for i in range(episode_count)：
    state = env.reset()
    steps = 0
    while True:
        action, state = model.predict(obs, state=state, deterministic=deterministic)
        obs, _, done, infos = env.step(action)

        # add action and obs to buffer

        steps += 1
        if steps == 60:
            break

araffin · 2020-11-16T10:21:54Z

If I want to limit each episode to include 60 timesteps. Will this be a problem?

What will be a problem?
~~This will be a problem if the episode length is less than 60 timesteps, it will throw the error you describe above.~~

Please take a closer look at the code I provided ;) (it is a bit hard to follow the logic if your snippet)

~~In you current code it should be if steps == 60 or done[0] to avoid the error.~~

Yingdong-Hu · 2020-11-16T11:20:07Z

Thank you very much, I've got it now.

araffin · 2020-11-16T11:24:45Z

If the issue is fixed, then you can close this one ;)

longfeizhang617 · 2021-12-28T10:01:41Z

Would you please tell me how to fix the problem? thanks

Miffyli · 2021-12-28T12:20:25Z

@longfeizhang617 You better open up a new issue (we do not know what is wrong in your case). However, go through documentation and examples carefully before opening the issue. Note that we do not offer tech support for custom environments.

Yingdong-Hu added the bug Something isn't working label Nov 16, 2020

araffin added the more information needed Please fill the issue template completely label Nov 16, 2020

Yingdong-Hu closed this as completed Nov 16, 2020

DLR-RM deleted a comment from longfeizhang617 Dec 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Step environment that needs reset #224

[Bug] Step environment that needs reset #224

Yingdong-Hu commented Nov 16, 2020 •

edited

Loading

araffin commented Nov 16, 2020

Yingdong-Hu commented Nov 16, 2020

araffin commented Nov 16, 2020 •

edited

Loading

Yingdong-Hu commented Nov 16, 2020

araffin commented Nov 16, 2020

longfeizhang617 commented Dec 28, 2021

Miffyli commented Dec 28, 2021

[Bug] Step environment that needs reset #224

[Bug] Step environment that needs reset #224

Comments

Yingdong-Hu commented Nov 16, 2020 • edited Loading

🐛 Step environment that needs reset

araffin commented Nov 16, 2020

Yingdong-Hu commented Nov 16, 2020

araffin commented Nov 16, 2020 • edited Loading

Yingdong-Hu commented Nov 16, 2020

araffin commented Nov 16, 2020

longfeizhang617 commented Dec 28, 2021

Miffyli commented Dec 28, 2021

Yingdong-Hu commented Nov 16, 2020 •

edited

Loading

araffin commented Nov 16, 2020 •

edited

Loading