Skip to content

Commit

Permalink
extend documentation to address DLR-RM#64
Browse files Browse the repository at this point in the history
and a few additional comments regarding hyperparameter defaults in general
  • Loading branch information
cboettig authored Jan 8, 2021
1 parent 3afaef7 commit 26511d4
Showing 1 changed file with 33 additions and 1 deletion.
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,42 @@ python train.py --algo sac --env Pendulum-v0 --save-replay-buffer
```
It will be automatically loaded if present when continuing training.

## Default agent hyperparameters

Note that the default hyperparameters used in the zoo when tuning or training are not always the same as the defaults provided in [stable-baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/base.html). Consult the latest source code to be sure of these settings. For example:

- PPO tuning assumes a network architecture with `ortho_init = False` when tuning, though it is `True` by [default](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#ppo-policies).

- PPO and A2C will use seperate architectures for policy and value networks, e.g. `net_arch=[dict(pi=[64, 64], vf=[64, 64])`, while the default is a shared network, `net_arch=[64, 64]`

- Non-epsodic rollout in TD3 and DDPG assumes `gradient_steps = train_freq` and so tunes only `train_freq`.

- By default, the training environment is always wrapped in [VecNormalize](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L13). [Normalization uses](https://github.com/DLR-RM/rl-baselines3-zoo/issues/64) the default parameters of `VecNormalize`, with the exception of `gamma` which is set to match that of the agent. This can be [overridden](https://github.com/DLR-RM/rl-baselines3-zoo/blob/v0.10.0/hyperparams/sac.yml#L239) using the appropriate `hyperparameters/algo_name.yml`, e.g.

```
normalize: "{'norm_obs': True, 'norm_reward': False}"
```

## Hyperparameter yaml syntax

The syntax used in `hyperparameters/algo_name.yml` for setting hyperparameters (likewise the syntax to [overwrite hyperparameters](https://github.com/DLR-RM/rl-baselines3-zoo#overwrite-hyperparameters) on the cli) may be specialized if the argument is a function. See examples in the `hyperparameters/` directory. For example:

- Specify a linear change rate in learning:

```
learning_rate: lin_0.012486195510232303
```

Specify a different activation function for the network:

```
policy_kwargs: "dict(activation_fn=nn.ReLU)"
```

## Hyperparameter Tuning

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.
We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.
Not all hyperparameters are tuned, and tuning enforces certain default hyperparameter settings that may be different from the official defaults. See [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py) for the current settings for each agent.

Note: hyperparameters search is not implemented for DQN for now.
when using SuccessiveHalvingPruner ("halving"), you must specify `--n-jobs > 1`
Expand Down

0 comments on commit 26511d4

Please sign in to comment.