-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Feature suggestion: Observation wrappers
Suppose we have an Atari environment which returns the observation
Tuple(Box(210, 160, 3), Box(128,))
which is both the image and memory.
We now would like to apply some filters to the image part of the observation.
If we would like to use Wrappers we will run into problems:
env_image = pick_tuple_first_wrapper(env) # This wrappers rewrites the observation_space and returns
# a new observation corresponding to the image part.
env_ram = pick_tuple_second_wrapper(env)
Then we apply some filter to the env_image
env_downscaled_image = downscale_image_wrapper(env, (84, 84))
And now combine the two again:
env_wrapped = combine_into_tuple_wrapper(env_downscaled_image, env_wrapped)
The problem with this approach is what should happen when env_wrapped.step(0)
is called? What if the two input environments above have different properties action_space
?
We need a filter function for observations that we can apply to only the observation.
The observation filters could then be combined and applied from an Environment wrapper as we are used to.
We would use it as follows:
pick_first = observation_wrapper.pick_tuple(0)
pick_second = observation_wrapper.pick_tuple(1)
downscale = observation_wrapper.downscale(shape=(84, 84))
join = observation_wrapper.join([pick_first, downscale], [pick_second]) # join two ob. chains
# phi_4(x) := (phi_2(phi_1(x)), phi_3(x)) where phi_1 = pick_first
phi_2 = downscale
phi_3 = pick_second
phi_4 = join
env = observation_wrapper(env, join) # returns observation' := phi_4(observation)
Comments? Ideas?
What if the observation wrapper is a "frame stacking" kind of observation?
Example: Stacking four frames (observations).
phi(x_t) = (x_{t-3}, x_{t-2} x_{t-1}, x_t), where x_k = zeros_like(x_t) for k < 0,
or x_k = x_t, for k < 0.
This can be seen as either the function phi having state or that it takes the previous output as input.
If there was no previous state, (like on a call to env.reset()) we would have the observation wrapper do what it needed to do to keep the observation fixed. In the frame stacking case a Box(dim1, dim2, ..., dimN) would be mapped to Box(4, dim1, dim2, ...., dimN) since the output observation would always have to match the contract of the env.observation_space property.
- Gym Repository
- Wiki Home
- Leaderboard
- Environments
- FAQ
- Resources
- Feature Requests