Skip to content

Feature suggestion: Observation wrappers

Henrik Holst edited this page Feb 14, 2017 · 13 revisions

Q/A for a suggested (experimental) feature Observation Wrapper

Motivation

Suppose we have an Atari environment which returns the observation

Tuple(Box(210, 160, 3), Box(128,))

which is both the image and memory.

We now would like to apply some filters to the image part of the observation.

If we would like to use Wrappers we will run into problems:

env_image = pick_tuple_first_wrapper(env)  # This wrappers rewrites the observation_space and returns
                                           # a new observation corresponding to the image part.
env_ram = pick_tuple_second_wrapper(env)  

Then we apply some filter to the env_image

env_downscaled_image = downscale_image_wrapper(env, (84, 84))

And now combine the two again:

env_wrapped = combine_into_tuple_wrapper(env_downscaled_image, env_wrapped)

The problem with this approach is what should happen when env_wrapped.step(0) is called? What if the two input environments above have different properties action_space?

We need a filter function for observations that we can apply to only the observation.

The observation filters could then be combined and applied from an Environment wrapper as we are used to.

We would use it as follows:

pick_first = observation_wrapper.pick_tuple(0)
pick_second = observation_wrapper.pick_tuple(1)
downscale = observation_wrapper.downscale(shape=(84, 84))
join = observation_wrapper.join([pick_first, downscale], [pick_second])  # join two ob. chains

# phi_4(x) := (phi_2(phi_1(x)), phi_3(x)) where phi_1 = pick_first
                                                phi_2 = downscale
                                                phi_3 = pick_second
                                                phi_4 = join

env = observation_wrapper(env, join)  # returns observation' := phi_4(observation)

Comments? Ideas?

Comment:

We might need the observation_space as an input. Say we have an observation wrapper called
downscale_2x which downscale an image to half width and half height. The observation_space
is not determined fully until we have an input observation_space.

Stateful observation wrappers

What if the observation wrapper is a "frame stacking" kind of observation?

Example: Stacking four frames (observations).

phi(x_t) = (x_{t-3}, x_{t-2}, x_{t-1}, x_t), where x_k = zeros_like(x_t) for k < 0,
                                                or x_k = x_t, for k < 0.

This can be seen as either the function phi having state or that it takes the previous output as input.

If there was no previous state, (like on a call to env.reset()) we would have the observation wrapper do what it needed to do to keep the observation fixed. In the frame stacking case a Box(dim1, dim2, ..., dimN) would be mapped to Box(4, dim1, dim2, ...., dimN) since the output observation would always have to match the contract of the env.observation_space property.

Q: When do we reset our observation wrappers? Should the environment call a reset() method on the wrappers?

Prototype class

class ObservationWrapper(object):
    """Observation wrapper base class."""
    def __init__(self, observation_space):
        self._observation_space = observation_space

    @property
    def observation_space(self):
        return self._observation_space

    def phi(self, observation):
        # assert observation is an instance of self._observation_space  # input obs. space.
        output = observation
        # assert output is an instance of self.observation_space  # output obs. space
        return output

    def reset(self):
        """Reset internal state."""
        pass