Skip to content

Feature suggestion: Observation wrappers

Henrik Holst edited this page Feb 14, 2017 · 13 revisions

Q/A for a suggested (experimental) feature Observation Wrapper

Motivation

Suppose we have an Atari environment which returns the observation

Tuple(Box(210, 160, 3), Box(128,))

which is both the image and memory.

We now would like to apply some filters to the image part of the observation.

If we would like to use Wrappers we will run into problems:

env_image = pick_tuple_first_wrapper(env)  # This wrappers rewrites the observation_space and returns
                                           # a new observation corresponding to the image part.
env_ram = pick_tuple_second_wrapper(env)  

Then we apply some filter to the env_image

env_downscaled_image = downscale_image_wrapper(env, (84, 84))

And now combine the two again:

env_wrapped = combine_into_tuple_wrapper(env_downscaled_image, env_wrapped)

The problem with this approach is what should happen when env_wrapped.step(0) is called? What if the two input environments above have different properties action_space?

We need a filter function for observations that we can apply to only the observation.

The observation filters could then be combined and applied from an Environment wrapper as we are used to.

We would use it as follows:

pick_first = observation_wrapper.pick_tuple(0)
pick_second = observation_wrapper.pick_tuple(1)
downscale = observation_wrapper.downscale(shape=(84, 84))
join = observation_wrapper.join([pick_first, downscale], [pick_second])  # join two ob. chains

# phi_4(x) := (phi_2(phi_1(x)), phi_3(x)) where phi_1 = pick_first
                                                phi_2 = downscale
                                                phi_3 = pick_second
                                                phi_4 = join

env = observation_wrapper(env, join)  # returns observation' := phi_4(observation)

Comments? Ideas?

Stateful observation wrappers

What if the observation wrapper is a "frame stacking" kind of observation?

Example: Stacking four frames (observations).

phi(x_t) = (x_{t-3}, x_{t-2} x_{t-1}, x_t), where x_k = zeros_like(x_t) for k < 0,
                                               or x_k = x_t, for k < 0.

This can be seen as either the function phi having state or that it takes the previous output as input.

If there was no previous state, (like on a call to env.reset()) we would have the observation wrapper do what it needed to do to keep the observation fixed. In the frame stacking case a Box(dim1, dim2, ..., dimN) would be mapped to Box(4, dim1, dim2, ...., dimN) since the output observation would always have to match the contract of the env.observation_space property.