Skip to content

Commit e88eb1c

Browse files
araffinMiffyli
andauthored
Add explanation of logger output (#803)
* Add explanation of logger output * Apply suggestions from code review Co-authored-by: Anssi <[email protected]> * Add example output Co-authored-by: Anssi <[email protected]>
1 parent cdaa9ab commit e88eb1c

File tree

4 files changed

+92
-1
lines changed

4 files changed

+92
-1
lines changed

docs/common/logger.rst

+81
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,86 @@ Available formats are ``["stdout", "csv", "log", "tensorboard", "json"]``.
2828
model.set_logger(new_logger)
2929
model.learn(10000)
3030
31+
32+
Explanation of logger output
33+
----------------------------
34+
35+
You can find below short explanations of the values logged in Stable-Baselines3 (SB3).
36+
Depending on the algorithm used and of the wrappers/callbacks applied, SB3 only logs a subset of those keys during training.
37+
38+
Below you can find an example of the logger output when training a PPO agent:
39+
40+
.. code-block:: bash
41+
42+
-----------------------------------------
43+
| eval/ | |
44+
| mean_ep_length | 200 |
45+
| mean_reward | -157 |
46+
| rollout/ | |
47+
| ep_len_mean | 200 |
48+
| ep_rew_mean | -227 |
49+
| time/ | |
50+
| fps | 972 |
51+
| iterations | 19 |
52+
| time_elapsed | 80 |
53+
| total_timesteps | 77824 |
54+
| train/ | |
55+
| approx_kl | 0.037781604 |
56+
| clip_fraction | 0.243 |
57+
| clip_range | 0.2 |
58+
| entropy_loss | -1.06 |
59+
| explained_variance | 0.999 |
60+
| learning_rate | 0.001 |
61+
| loss | 0.245 |
62+
| n_updates | 180 |
63+
| policy_gradient_loss | -0.00398 |
64+
| std | 0.205 |
65+
| value_loss | 0.226 |
66+
-----------------------------------------
67+
68+
69+
eval/
70+
^^^^^
71+
All ``eval/`` values are computed by the ``EvalCallback``.
72+
73+
- ``mean_ep_length``: Mean episode length
74+
- ``mean_reward``: Mean episodic reward (during evaluation)
75+
- ``success_rate``: Mean success rate during evaluation (1.0 means 100% success), the environment info dict must contain an ``is_success`` key to compute that value
76+
77+
rollout/
78+
^^^^^^^^
79+
- ``ep_len_mean``: Mean episode length (averaged over 100 episodes)
80+
- ``ep_rew_mean``: Mean episodic training reward (averaged over 100 episodes), a ``Monitor`` wrapper is required to compute that value (automatically added by `make_vec_env`).
81+
- ``exploration_rate``: Current value of the exploration rate when using DQN, it corresponds to the fraction of actions taken randomly (epsilon of the "epsilon-greedy" exploration)
82+
- ``success_rate``: Mean success rate during training (averaged over 100 episodes), you must pass an extra argument to the ``Monitor`` wrapper to log that value (``info_keywords=("is_success",)``) and provide ``info["is_success"]=True/False`` on the final step of the episode
83+
84+
time/
85+
^^^^^
86+
- ``episodes``: Total number of episodes
87+
- ``fps``: Number of frames per seconds (includes time taken by gradient update)
88+
- ``iterations``: Number of iterations (data collection + policy update for A2C/PPO)
89+
- ``time_elapsed``: Time in seconds since the beginning of training
90+
- ``total_timesteps``: Total number of timesteps (steps in the environments)
91+
92+
train/
93+
^^^^^^
94+
- ``actor_loss``: Current value for the actor loss for off-policy algorithms
95+
- ``approx_kl``: approximate mean KL divergence between old and new policy (for PPO), it is an estimation of how much changes happened in the update
96+
- ``clip_fraction``: mean fraction of surrogate loss that was clipped (above ``clip_range`` threshold) for PPO.
97+
- ``clip_range``: Current value of the clipping factor for the surrogate loss of PPO
98+
- ``critic_loss``: Current value for the critic function loss for off-policy algorithms, usually error between value function output and TD(0), temporal difference estimate
99+
- ``ent_coef``: Current value of the entropy coefficient (when using SAC)
100+
- ``ent_coef_loss``: Current value of the entropy coefficient loss (when using SAC)
101+
- ``entropy_loss``: Mean value of the entropy loss (negative of the average policy entropy)
102+
- ``explained_variance``: Fraction of the return variance explained by the value function, see https://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score
103+
(ev=0 => might as well have predicted zero, ev=1 => perfect prediction, ev<0 => worse than just predicting zero)
104+
- ``learning_rate``: Current learning rate value
105+
- ``loss``: Current total loss value
106+
- ``n_updates``: Number of gradient updates applied so far
107+
- ``policy_gradient_loss``: Current value of the policy gradient loss (its value does not have much meaning)
108+
- ``value_loss``: Current value for the value function loss for on-policy algorithms, usually error between value function output and Monte-Carle estimate (or TD(lambda) estimate)
109+
- ``std``: Current standard deviation of the noise when using generalized State-Dependent Exploration (gSDE)
110+
111+
31112
.. automodule:: stable_baselines3.common.logger
32113
:members:

docs/guide/quickstart.rst

+4
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ Here is a quick example of how to train and run A2C on a CartPole environment:
2727
if done:
2828
obs = env.reset()
2929
30+
.. note::
31+
32+
You can find explanations about the logger output and names in the :ref:`Logger <logger>` section.
33+
3034

3135
Or just train a model with a one liner if
3236
`the environment is registered in Gym <https://github.com/openai/gym/wiki/Environments>`_ and if

docs/guide/tensorboard.rst

+6
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ Once the learn function is called, you can monitor the RL agent during or after
3636
3737
tensorboard --logdir ./a2c_cartpole_tensorboard/
3838
39+
40+
.. note::
41+
42+
You can find explanations about the logger output and names in the :ref:`Logger <logger>` section.
43+
44+
3945
you can also add past logging folders:
4046

4147
.. code-block:: bash

docs/misc/changelog.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Documentation:
4747
- Added furuta pendulum project to project list (@armandpl)
4848
- Fix indentation 2 spaces to 4 spaces in custom env documentation example (@Gautam-J)
4949
- Update MlpExtractor docstring (@gianlucadecola)
50-
50+
- Added explanation of the logger output
5151

5252
Release 1.4.0 (2022-01-18)
5353
---------------------------

0 commit comments

Comments
 (0)