Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] gym normalize wrappers are incompatible with envpool #3021

Closed
1 task done
vwxyzjn opened this issue Aug 11, 2022 · 3 comments · Fixed by #3026
Closed
1 task done

[Bug Report] gym normalize wrappers are incompatible with envpool #3021

vwxyzjn opened this issue Aug 11, 2022 · 3 comments · Fixed by #3026

Comments

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Aug 11, 2022

Describe the bug
gym.wrappers.NormalizeObservation and gym.wrappers.NormalizeReward are incompatible with envpool. See sail-sg/envpool#185

Code example

import numpy as np
import envpool
import gym

envs = envpool.make(
    "HalfCheetah-v4",
    env_type="gym",
    num_envs=4,
)
envs.num_envs = 4
envs.single_action_space = envs.action_space
envs.single_observation_space = envs.observation_space
envs.is_vector_env = True
envs = gym.wrappers.ClipAction(envs)
envs = gym.wrappers.NormalizeObservation(envs)
envs = gym.wrappers.TransformObservation(envs, lambda obs: np.clip(obs, -10, 10))
envs = gym.wrappers.NormalizeReward(envs)
envs = gym.wrappers.TransformReward(envs, lambda reward: np.clip(reward, -10, 10))
obs = envs.reset()
envs.step(np.array([envs.action_space.sample() for _ in range(envs.num_envs)]))
Traceback (most recent call last):
  File "/home/costa/Documents/go/src/github.com/vwxyzjn/envpool-cleanrl/bug.py", line 22, in <module>
    envs.step(np.array([envs.action_space.sample() for _ in range(envs.num_envs)]))
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/core.py", line 532, in step
    step_returns = self.env.step(action)
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/wrappers/normalize.py", line 149, in step
    self.env.step(action), True, self.is_vector_env
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/core.py", line 493, in step
    step_returns = self.env.step(action)
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/wrappers/normalize.py", line 77, in step
    obs, rews, terminateds, truncateds, infos = step_api_compatibility(
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/utils/step_api_compatibility.py", line 178, in step_api_compatibility
    return step_to_new_api(step_returns, is_vector_env)
  File "/home/costa/.cache/pypoetry/virtualenvs/envpool-cleanrl-uAHoRI5J-py3.9/lib/python3.9/site-packages/gym/utils/step_api_compatibility.py", line 59, in step_to_new_api
    and not infos["_TimeLimit.truncated"][i]
KeyError: '_TimeLimit.truncated'

System Info
Describe the characteristic of your environment:

  • Describe how Gym was installed (pip, docker, source, ...) pip
  • What OS/version of Linux you're using. Note that while we will accept PRs to improve Window's support, we do not officially support it. Linux.
  • Python version: 3.9

Additional context
Add any other context about the problem here.

Checklist

  • I have checked that there is no similar issue in the repo (required)
@arjun-kg
Copy link
Contributor

Thank you for the bug report.

In the step API compatibility code, I assumed that if X key exists in infos, _X mask key will also exist and this is causing the error. I made this assumption since this was gym's new way of handling vector infos (see here).

But envpool does not seem to use mask keys. I have a question in that case. In the old step API there is a difference in meaning when the key TimeLimit.truncated is not present in info vs it is present and set to False. Since envpool does not have the mask key, how does it differentiate between these two cases?

@arjun-kg
Copy link
Contributor

I think I can patch this up regardless, but do we want to unify the way envpool handles vector infos vs how gym handles it? @pseudo-rnd-thoughts

@pseudo-rnd-thoughts
Copy link
Contributor

The reason gym changed its approach to vector info is for jax based vectorisation, in particular, for brax, the shape of each key needed to be constant. However, it is impossible to tell between default data and no data if you want to use default data.
An example to help where we use the np.zeros((num_envs), dtype=type(data))

info_1 = {"a": 1, "b": 0, "c": False}
info_2 = {}

vector_info = {"a": [1, 0], "b": [0, 0], "c": [False, False]}

Therefore, we added an underscore version of each key to show if the key actually exists for the sub-env such that default data is usable. I hope that makes sense.

For TimeLimit.truncated, as we know that we only care about the answer when it is True, we can ignore the underscore version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants