Skip to content

Commit

Permalink
Updated Wrapper docs (#173)
Browse files Browse the repository at this point in the history
  • Loading branch information
Markus28 authored Dec 3, 2022
1 parent 4b7f941 commit 851b2f4
Show file tree
Hide file tree
Showing 8 changed files with 218 additions and 249 deletions.
51 changes: 12 additions & 39 deletions docs/api/wrappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ wrappers/observation_wrappers
wrappers/reward_wrappers
```

```{eval-rst}
.. automodule:: gymnasium.wrappers
```

## gymnasium.Wrapper

```{eval-rst}
Expand All @@ -35,6 +40,13 @@ wrappers/reward_wrappers
.. autoproperty:: gymnasium.Wrapper.spec
.. autoproperty:: gymnasium.Wrapper.metadata
.. autoproperty:: gymnasium.Wrapper.np_random
.. attribute:: gymnasium.Wrapper.env
The environment (one level underneath) this wrapper.
This may itself be a wrapped environment.
To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
.. autoproperty:: gymnasium.Wrapper.unwrapped
```

Expand Down Expand Up @@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
* - :class:`VectorListInfo`
- Misc Wrapper
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
```

## Implementing a custom wrapper

Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in `info` or change the rendering behavior).
Such wrappers can be implemented by inheriting from Misc Wrapper.

- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.

Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:

```python
import gymnasium as gym

class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight

def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = (
self.reward_dist_weight * info["reward_dist"]
+ self.reward_ctrl_weight * info["reward_ctrl"]
)
return obs, reward, terminated, truncated, info
```

```{note}
It is *not* sufficient to use a `RewardWrapper` in this case!
```
12 changes: 3 additions & 9 deletions docs/api/wrappers/action_wrappers.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,16 @@
# Action Wrappers

## Action Wrapper
## Base Class

```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
.. autofunction:: gymnasium.ActionWrapper.action
.. automethod:: gymnasium.ActionWrapper.action
```

## Clip Action

## Available Action Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ClipAction
```

## Rescale Action

```{eval-rst}
.. autoclass:: gymnasium.wrappers.RescaleAction
```

53 changes: 0 additions & 53 deletions docs/api/wrappers/misc_wrappers.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,15 @@
# Misc Wrappers

## Atari Preprocessing

```{eval-rst}
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
```

## Autoreset

```{eval-rst}
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
```

## Compatibility

```{eval-rst}
.. autoclass:: gymnasium.wrappers.EnvCompatibility
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
```

## Passive Environment Checker

```{eval-rst}
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
```

## Human Rendering

```{eval-rst}
.. autoclass:: gymnasium.wrappers.HumanRendering
```

## Order Enforcing

```{eval-rst}
.. autoclass:: gymnasium.wrappers.OrderEnforcing
```

## Record Episode Statistics

```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
```

## Record Video

```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordVideo
```

## Render Collection

```{eval-rst}
.. autoclass:: gymnasium.wrappers.RenderCollection
```

## Time Limit

```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeLimit
```

## Vector List Info

```{eval-rst}
.. autoclass:: gymnasium.wrappers.VectorListInfo
```
47 changes: 4 additions & 43 deletions docs/api/wrappers/observation_wrappers.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,23 @@
# Observation Wrappers

## Observation Wrapper
## Base Class

```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
.. autofunction:: gymnasium.ObservationWrapper.observation
```

## Transform Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
.. automethod:: gymnasium.ObservationWrapper.observation
```

## Filter Observation
## Available Observation Wrappers

```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
.. autoclass:: gymnasium.wrappers.FilterObservation
```

## Flatten Observation

```{eval-rst}
.. autoclass:: gymnasium.wrappers.FlattenObservation
```

## Framestack Observations

```{eval-rst}
.. autoclass:: gymnasium.wrappers.FrameStack
```

## Gray Scale Observation

```{eval-rst}
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
```

## Normalize Observation

```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeObservation
```

## Pixel Observation Wrapper

```{eval-rst}
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
```

## Resize Observation

```{eval-rst}
.. autoclass:: gymnasium.wrappers.ResizeObservation
```

## Time Aware Observation

```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
```
11 changes: 3 additions & 8 deletions docs/api/wrappers/reward_wrappers.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@

# Reward Wrappers

## Reward Wrapper
## Base Class

```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
.. autofunction:: gymnasium.RewardWrapper.reward
.. automethod:: gymnasium.RewardWrapper.reward
```

## Transform Reward
## Available Reward Wrappers

```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformReward
```

## Normalize Reward

```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeReward
```
137 changes: 137 additions & 0 deletions docs/tutorials/implementing_custom_wrappers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
"""
Implementing Custom Wrappers
============================
In this tutorial we will describe how to implement your own custom wrappers.
Wrappers are a great way to add functionality to your environments in a modular way.
This will save you a lot of boilerplate code.
We will show how to create a wrapper by
- Inheriting from :class:`gymnasium.ObservationWrapper`
- Inheriting from :class:`gymnasium.ActionWrapper`
- Inheriting from :class:`gymnasium.RewardWrapper`
- Inheriting from :class:`gymnasium.Wrapper`
Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.
"""

# %%
# Inheriting from :class:`gymnasium.ObservationWrapper`
# -----------------------------------------------------
# Observation wrappers are useful if you want to apply some function to the observations that are returned
# by an environment. If you implement an observation wrapper, you only need to define this transformation
# by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to
# update the observation space, if the transformation changes the shape of observations (e.g. by transforming
# dictionaries into numpy arrays, as in the following example).
#
# Imagine you have a 2D navigation task where the environment returns dictionaries as observations with
# keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
# freedom and only consider the position of the target relative to the agent, i.e.
# ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
# observation wrapper like this:

import numpy as np
from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper

import gymnasium as gym
from gymnasium.spaces import Box, Discrete


class RelativePosition(ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)

def observation(self, obs):
return obs["target"] - obs["agent"]


# %%
# Inheriting from :class:`gymnasium.ActionWrapper`
# ------------------------------------------------
# Action wrappers can be used to apply a transformation to actions before applying them to the environment.
# If you implement an action wrapper, you need to define that transformation by implementing
# :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation
# by updating the action space of the wrapper.
#
# Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
# to use a finite subset of actions. Then, you might want to implement the following wrapper:


class DiscreteActions(ActionWrapper):
def __init__(self, env, disc_to_cont):
super().__init__(env)
self.disc_to_cont = disc_to_cont
self.action_space = Discrete(len(disc_to_cont))

def action(self, act):
return self.disc_to_cont[act]


if __name__ == "__main__":
env = gym.make("LunarLanderContinuous-v2")
wrapped_env = DiscreteActions(
env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]
)
print(wrapped_env.action_space) # Discrete(4)


# %%
# Inheriting from :class:`gymnasium.RewardWrapper`
# ------------------------------------------------
# Reward wrappers are used to transform the reward that is returned by an environment.
# As for the previous wrappers, you need to specify that transformation by implementing the
# :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.
#
# Let us look at an example: Sometimes (especially when we do not have control over the reward
# because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
# To do that, we could, for instance, implement the following wrapper:

from typing import SupportsFloat


class ClipReward(RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
self.max_reward = max_reward
self.reward_range = (min_reward, max_reward)

def reward(self, r: SupportsFloat) -> SupportsFloat:
return np.clip(r, self.min_reward, self.max_reward)


# %%
# Inheriting from :class:`gymnasium.Wrapper`
# ------------------------------------------
# Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
# reward based on data in ``info`` or change the rendering behavior).
# Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.
#
# - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively
# - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively
# - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.
# If you do this, you can access the environment that was passed
# to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.
#
# Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
# of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
# penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
# initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
# of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:


class ReacherRewardWrapper(Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight

def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = (
self.reward_dist_weight * info["reward_dist"]
+ self.reward_ctrl_weight * info["reward_ctrl"]
)
return obs, reward, terminated, truncated, info
Loading

0 comments on commit 851b2f4

Please sign in to comment.