-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Feature suggestion: Observation wrappers
Suppose we have an Atari environment which returns the observation
Tuple(Box(210, 160, 3), Box(128,))
which is both the image and memory.
We now would like to apply some filters to the image part of the observation.
If we would like to use Wrappers we will run into problems:
env_image = pick_tuple_first_wrapper(env) # This wrappers rewrites the observation_space and returns
# a new observation corresponding to the image part.
env_ram = pick_tuple_second_wrapper(env)
Then we apply some filter to the env_image
env_downscaled_image = downscale_image_wrapper(env, (84, 84))
And now combine the two again:
env_wrapped = combine_into_tuple_wrapper(env_downscaled_image, env_wrapped)
The problem with this approach is what should happen when env_wrapped.step(0)
is called? What if the two input environments above have different properties action_space
?
We need a filter function for observations that we can apply to only the observation.
The observation filters could then be combined and applied from an Environment wrapper as we are used to.
We would use it as follows:
pick_first = observation_wrapper.pick_tuple(0)
pick_second = observation_wrapper.pick_tuple(1)
downscale = observation_wrapper.downscale(shape=(84, 84))
join = observation_wrapper.join([pick_first, downscale], [pick_second]) # join two ob. chains
# phi_4(x) := (phi_2(phi_1(x)), phi_3(x)) where phi_1 = pick_first
phi_2 = downscale
phi_3 = pick_second
phi_4 = join
env = observation_wrapper(env, join) # returns observation' := phi_4(observation)
Comments? Ideas?
Comment:
We might need the observation_space as an input. Say we have an observation wrapper called
downscale_2x which downscale an image to half width and half height. The observation_space
is not determined fully until we have an input observation_space.
What if the observation wrapper is a "frame stacking" kind of observation?
Example: Stacking four frames (observations).
phi(x_t) = (x_{t-3}, x_{t-2}, x_{t-1}, x_t), where x_k = zeros_like(x_t) for k < 0,
or x_k = x_t, for k < 0.
This can be seen as either the function phi having state or that it takes the previous output as input.
If there was no previous state, (like on a call to env.reset()) we would have the observation wrapper do what it needed to do to keep the observation fixed. In the frame stacking case a Box(dim1, dim2, ..., dimN) would be mapped to Box(4, dim1, dim2, ...., dimN) since the output observation would always have to match the contract of the env.observation_space property.
Q: When do we reset our observation wrappers? Should the environment call a reset() method on the wrappers?
class ObservationWrapper(object): """Observation wrapper base class.""" def init(self, observation_space): self._observation_space = observation_space pass
@property
def observation_space(self):
return self._observation_space
def phi(self, observation):
# assert observation is an instance of self._observation_space # input obs. space.
output = observation
# assert output is an instance of self.observation_space # output obs. space
return output
def reset(self):
"""Reset internal state."""
pass
- Gym Repository
- Wiki Home
- Leaderboard
- Environments
- FAQ
- Resources
- Feature Requests