Updated Wrapper docs (#173)

Farama-Foundation · Dec 3, 2022 · 851b2f4 · 851b2f4
1 parent 4b7f941
commit 851b2f4
Show file tree

Hide file tree

Showing 8 changed files with 218 additions and 249 deletions.
diff --git a/docs/api/wrappers.md b/docs/api/wrappers.md
@@ -12,6 +12,11 @@ wrappers/observation_wrappers
 wrappers/reward_wrappers
 ```
 
+```{eval-rst}
+.. automodule:: gymnasium.wrappers
+
+```
+
 ## gymnasium.Wrapper
 
 ```{eval-rst}
@@ -35,6 +40,13 @@ wrappers/reward_wrappers
 .. autoproperty:: gymnasium.Wrapper.spec
 .. autoproperty:: gymnasium.Wrapper.metadata
 .. autoproperty:: gymnasium.Wrapper.np_random
+.. attribute:: gymnasium.Wrapper.env
+
+    The environment (one level underneath) this wrapper. 
+    
+    This may itself be a wrapped environment. 
+    To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
+
 .. autoproperty:: gymnasium.Wrapper.unwrapped
 ```
 
@@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
     * - :class:`VectorListInfo`          
       - Misc Wrapper            
       - This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment. 
-```
-
-## Implementing a custom wrapper
-
-Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
-reward based on data in `info` or change the rendering behavior). 
-Such wrappers can be implemented by inheriting from Misc Wrapper. 
-
-- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
-- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
-- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
-to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.
-
-Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
-of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
-penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
-initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
-of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
-
-```python
-import gymnasium as gym
-
-class ReacherRewardWrapper(gym.Wrapper):
-    def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
-        super().__init__(env)
-        self.reward_dist_weight = reward_dist_weight
-        self.reward_ctrl_weight = reward_ctrl_weight
-
-    def step(self, action):
-        obs, _, terminated, truncated, info = self.env.step(action)
-        reward = (
-            self.reward_dist_weight * info["reward_dist"]
-            + self.reward_ctrl_weight * info["reward_ctrl"]
-        )
-        return obs, reward, terminated, truncated, info
-```
-
-```{note}
-It is *not* sufficient to use a `RewardWrapper` in this case!
 ```
diff --git a/docs/api/wrappers/action_wrappers.md b/docs/api/wrappers/action_wrappers.md
@@ -1,22 +1,16 @@
 # Action Wrappers
 
-## Action Wrapper
+## Base Class
 
 ```{eval-rst}
 .. autoclass:: gymnasium.ActionWrapper
 
-    ..  autofunction:: gymnasium.ActionWrapper.action
+    ..  automethod:: gymnasium.ActionWrapper.action
 ```
 
-## Clip Action
-
+## Available Action Wrappers
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.ClipAction
-```
-
-## Rescale Action
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RescaleAction
 ```
 
diff --git a/docs/api/wrappers/misc_wrappers.md b/docs/api/wrappers/misc_wrappers.md
@@ -1,68 +1,15 @@
 # Misc Wrappers
-
-## Atari Preprocessing
-
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.AtariPreprocessing
-```
-
-## Autoreset
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.AutoResetWrapper
-```
-
-## Compatibility
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.EnvCompatibility
 .. autoclass:: gymnasium.wrappers.StepAPICompatibility
-```
-
-## Passive Environment Checker
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.PassiveEnvChecker
-```
-
-## Human Rendering
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.HumanRendering
-```
-
-## Order Enforcing
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.OrderEnforcing
-```
-
-## Record Episode Statistics
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
-```
-
-## Record Video
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RecordVideo
-```
-
-## Render Collection
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RenderCollection
-```
-
-## Time Limit
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TimeLimit
-```
-
-## Vector List Info
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.VectorListInfo
 ```
diff --git a/docs/api/wrappers/observation_wrappers.md b/docs/api/wrappers/observation_wrappers.md
@@ -1,62 +1,23 @@
 # Observation Wrappers
 
-## Observation Wrapper
+## Base Class
 
 ```{eval-rst}
 .. autoclass:: gymnasium.ObservationWrapper
-.. autofunction:: gymnasium.ObservationWrapper.observation
-```
-
-## Transform Observation
 
-```{eval-rst}
-.. autoclass:: gymnasium.wrappers.TransformObservation
+    .. automethod:: gymnasium.ObservationWrapper.observation
 ```
 
-## Filter Observation
+## Available Observation Wrappers
 
 ```{eval-rst}
+.. autoclass:: gymnasium.wrappers.TransformObservation
 .. autoclass:: gymnasium.wrappers.FilterObservation
-```
-
-## Flatten Observation
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.FlattenObservation
-```
-
-## Framestack Observations
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.FrameStack
-```
-
-## Gray Scale Observation
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.GrayScaleObservation
-```
-
-## Normalize Observation
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.NormalizeObservation
-```
-
-## Pixel Observation Wrapper
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.PixelObservationWrapper
-```
-
-## Resize Observation
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.ResizeObservation
-```
-
-## Time Aware Observation
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TimeAwareObservation
 ```
diff --git a/docs/api/wrappers/reward_wrappers.md b/docs/api/wrappers/reward_wrappers.md
@@ -1,22 +1,17 @@
 
 # Reward Wrappers
 
-## Reward Wrapper
+## Base Class
 
 ```{eval-rst}
 .. autoclass:: gymnasium.RewardWrapper
 
-    .. autofunction:: gymnasium.RewardWrapper.reward
+    .. automethod:: gymnasium.RewardWrapper.reward
 ```
 
-## Transform Reward
+## Available Reward Wrappers
 
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TransformReward
-```
-
-## Normalize Reward
-
-```{eval-rst}
 .. autoclass:: gymnasium.wrappers.NormalizeReward
 ```
diff --git a/docs/tutorials/implementing_custom_wrappers.py b/docs/tutorials/implementing_custom_wrappers.py
@@ -0,0 +1,137 @@
+"""
+Implementing Custom Wrappers
+============================
+
+In this tutorial we will describe how to implement your own custom wrappers.
+Wrappers are a great way to add functionality to your environments in a modular way.
+This will save you a lot of boilerplate code.
+
+We will show how to create a wrapper by
+
+- Inheriting from :class:`gymnasium.ObservationWrapper`
+- Inheriting from :class:`gymnasium.ActionWrapper`
+- Inheriting from :class:`gymnasium.RewardWrapper`
+- Inheriting from :class:`gymnasium.Wrapper`
+
+Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.
+"""
+
+# %%
+# Inheriting from :class:`gymnasium.ObservationWrapper`
+# -----------------------------------------------------
+# Observation wrappers are useful if you want to apply some function to the observations that are returned
+# by an environment. If you implement an observation wrapper, you only need to define this transformation
+# by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to
+# update the observation space, if the transformation changes the shape of observations (e.g. by transforming
+# dictionaries into numpy arrays, as in the following example).
+#
+# Imagine you have a 2D navigation task where the environment returns dictionaries as observations with
+# keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
+# freedom and only consider the position of the target relative to the agent, i.e.
+# ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
+# observation wrapper like this:
+
+import numpy as np
+from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper
+
+import gymnasium as gym
+from gymnasium.spaces import Box, Discrete
+
+
+class RelativePosition(ObservationWrapper):
+    def __init__(self, env):
+        super().__init__(env)
+        self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
+
+    def observation(self, obs):
+        return obs["target"] - obs["agent"]
+
+
+# %%
+# Inheriting from :class:`gymnasium.ActionWrapper`
+# ------------------------------------------------
+# Action wrappers can be used to apply a transformation to actions before applying them to the environment.
+# If you implement an action wrapper, you need to define that transformation by implementing
+# :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation
+# by updating the action space of the wrapper.
+#
+# Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
+# to use a finite subset of actions. Then, you might want to implement the following wrapper:
+
+
+class DiscreteActions(ActionWrapper):
+    def __init__(self, env, disc_to_cont):
+        super().__init__(env)
+        self.disc_to_cont = disc_to_cont
+        self.action_space = Discrete(len(disc_to_cont))
+
+    def action(self, act):
+        return self.disc_to_cont[act]
+
+
+if __name__ == "__main__":
+    env = gym.make("LunarLanderContinuous-v2")
+    wrapped_env = DiscreteActions(
+        env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]
+    )
+    print(wrapped_env.action_space)  # Discrete(4)
+
+
+# %%
+# Inheriting from :class:`gymnasium.RewardWrapper`
+# ------------------------------------------------
+# Reward wrappers are used to transform the reward that is returned by an environment.
+# As for the previous wrappers, you need to specify that transformation by implementing the
+# :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.
+#
+# Let us look at an example: Sometimes (especially when we do not have control over the reward
+# because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
+# To do that, we could, for instance, implement the following wrapper:
+
+from typing import SupportsFloat
+
+
+class ClipReward(RewardWrapper):
+    def __init__(self, env, min_reward, max_reward):
+        super().__init__(env)
+        self.min_reward = min_reward
+        self.max_reward = max_reward
+        self.reward_range = (min_reward, max_reward)
+
+    def reward(self, r: SupportsFloat) -> SupportsFloat:
+        return np.clip(r, self.min_reward, self.max_reward)
+
+
+# %%
+# Inheriting from :class:`gymnasium.Wrapper`
+# ------------------------------------------
+# Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
+# reward based on data in ``info`` or change the rendering behavior).
+# Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.
+#
+# - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively
+# - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively
+# - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.
+# If you do this, you can access the environment that was passed
+# to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.
+#
+# Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
+# of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
+# penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
+# initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
+# of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
+
+
+class ReacherRewardWrapper(Wrapper):
+    def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
+        super().__init__(env)
+        self.reward_dist_weight = reward_dist_weight
+        self.reward_ctrl_weight = reward_ctrl_weight
+
+    def step(self, action):
+        obs, _, terminated, truncated, info = self.env.step(action)
+        reward = (
+            self.reward_dist_weight * info["reward_dist"]
+            + self.reward_ctrl_weight * info["reward_ctrl"]
+        )
+        return obs, reward, terminated, truncated, info