Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct typos #614

Merged
merged 5 commits into from
Dec 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/common/schedules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
Schedules
=========

Schedules are used as hyperparameter for most of the algortihms,
in order to change value of a parameter over time (usuallly the learning rate).
Schedules are used as hyperparameter for most of the algorithms,
in order to change value of a parameter over time (usually the learning rate).


.. automodule:: stable_baselines.common.schedules
Expand Down
11 changes: 11 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
import sys
from unittest.mock import MagicMock

# We CANNOT enable 'sphinxcontrib.spelling' because ReadTheDocs.org does not support
# PyEnchant.
try:
import sphinxcontrib.spelling
enable_spell_check = True
except ImportError:
enable_spell_check = False

# source code directory, relative to this file, for sphinx-autobuild
sys.path.insert(0, os.path.abspath('..'))

Expand Down Expand Up @@ -69,6 +77,9 @@ def __getattr__(cls, name):
'sphinx.ext.viewcode',
]

if enable_spell_check:
extensions.append('sphinxcontrib.spelling')

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand Down
2 changes: 1 addition & 1 deletion docs/guide/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Explanation of the docker command:
- ``--ipc=host`` Use the host system’s IPC namespace. IPC (POSIX/SysV IPC) namespace provides
separation of named shared memory segments, semaphores and message
queues.
- ``--name test`` give explicitely the name ``test`` to the container,
- ``--name test`` give explicitly the name ``test`` to the container,
otherwise it will be assigned a random name
- ``--mount src=...`` give access of the local directory (``pwd``
command) to the container (it will be map to ``/root/code/stable-baselines``), so
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/pretrain.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The idea is that this callable can be a PID controller, asking a human player, .
return env.action_space.sample()
# Data will be saved in a numpy archive named `expert_cartpole.npz`
# when using something different than an RL expert,
# you must pass the environment object explicitely
# you must pass the environment object explicitly
generate_expert_traj(dummy_expert, 'dummy_expert_cartpole', env, n_episodes=10)


Expand Down
6 changes: 3 additions & 3 deletions docs/guide/rl_tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ bad trajectories.
This factor, among others, explains that results in RL may vary from one run to another (i.e., when only the seed of the pseudo-random generator changes).
For this reason, you should always do several runs to have quantitative results.

Good results in RL are generally dependent on finding appropriate hyperparameters. Recent alogrithms (PPO, SAC, TD3) normally require little hyperparameter tuning,
Good results in RL are generally dependent on finding appropriate hyperparameters. Recent algorithms (PPO, SAC, TD3) normally require little hyperparameter tuning,
however, *don't expect the default ones to work* on any environment.

Therefore, we *highly recommend you* to take a look at the `RL zoo <https://github.com/araffin/rl-baselines-zoo>`_ (or the original papers) for tuned hyperparameters.
Expand Down Expand Up @@ -93,7 +93,7 @@ or continuous actions (ex: go to a certain speed)?
Some algorithms are only tailored for one or the other domain: `DQN` only supports discrete actions, where `SAC` is restricted to continuous actions.

The second difference that will help you choose is whether you can parallelize your training or not, and how you can do it (with or without MPI?).
If what matters is the wall clock training time, then you should lean towards `À2C` and its derivates (PPO, ACER, ACKTR, ...).
If what matters is the wall clock training time, then you should lean towards `A2C` and its derivatives (PPO, ACER, ACKTR, ...).
Take a look at the `Vectorized Environments <vec_envs.html>`_ to learn more about training with multiple workers.

To sum it up:
Expand Down Expand Up @@ -146,7 +146,7 @@ If you can use MPI, then you can choose between PPO1, TRPO and DDPG.
Goal Environment
-----------------

If your environment follows the `GoalEnv` interface (cf `HER <her.html>`_), then you should use
If your environment follows the `GoalEnv` interface (cf `HER <../modules/her.html>`_), then you should use
HER + (SAC/TD3/DDPG/DQN) depending on the action space.


Expand Down
4 changes: 3 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ Documentation:
- Update custom env documentation to reflect new gym API for the `close()` method (@justinkterry)
- Update custom env documentation to clarify what step and reset return (@justinkterry)
- Add RL tips and tricks for doing RL experiments
- Corrected lots of typos
- Add spell check to documentation if available


Release 2.8.0 (2019-09-29)
Expand Down Expand Up @@ -388,7 +390,7 @@ Release 2.1.1 (2018-10-20)
--------------------------

- fixed MpiAdam synchronization issue in PPO1 (thanks to @brendenpetersen) issue #50
- fixed dependency issues (new mujoco-py requires a mujoco licence + gym broke MultiDiscrete space shape)
- fixed dependency issues (new mujoco-py requires a mujoco license + gym broke MultiDiscrete space shape)


Release 2.1.0 (2018-10-2)
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/her.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Goal Selection Strategies
:undoc-members:


Gaol Env Wrapper
Goal Env Wrapper
----------------

.. autoclass:: HERGoalEnvWrapper
Expand Down
103 changes: 103 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
py
env
atari
argparse
Argparse
TensorFlow
feedforward
envs
VecEnv
pretrain
petrained
tf
np
mujoco
cpu
ndarray
ndarrays
timestep
timesteps
stepsize
dataset
adam
fn
normalisation
Kullback
Leibler
boolean
deserialized
pretrained
minibatch
subprocesses
ArgumentParser
Tensorflow
Gaussian
approximator
minibatches
hyperparameters
hyperparameter
vectorized
rl
colab
dataloader
npz
datasets
vf
logits
num
Utils
backpropagate
prepend
NaN
preprocessing
Cloudpickle
async
multiprocess
tensorflow
mlp
cnn
neglogp
tanh
coef
repo
Huber
params
ppo
arxiv
Arxiv
func
DQN
Uhlenbeck
Ornstein
multithread
cancelled
Tensorboard
parallelize
customising
serializable
Multiprocessed
cartpole
toolset
lstm
rescale
ffmpeg
avconv
unnormalized
Github
pre
preprocess
backend
attr
preprocess
Antonin
Raffin
araffin
Homebrew
Numpy
Theano
rollout
kfac
Piecewise
csv
nvidia
visdom
6 changes: 3 additions & 3 deletions stable_baselines/acer/acer_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class ACER(ActorCriticRLModel):
Use `n_cpu_tf_sess` instead.

:param q_coef: (float) The weight for the loss on the Q value
:param ent_coef: (float) The weight for the entropic loss
:param ent_coef: (float) The weight for the entropy loss
:param max_grad_norm: (float) The clipping value for the maximum gradient
:param learning_rate: (float) The initial learning rate for the RMS prop optimizer
:param lr_schedule: (str) The type of scheduler for the learning rate update ('linear', 'constant',
Expand Down Expand Up @@ -390,13 +390,13 @@ def custom_getter(getter, name, *args, **kwargs):
tf.summary.scalar('rewards', tf.reduce_mean(self.reward_ph))
tf.summary.scalar('learning_rate', tf.reduce_mean(self.learning_rate))
tf.summary.scalar('advantage', tf.reduce_mean(adv))
tf.summary.scalar('action_probabilty', tf.reduce_mean(self.mu_ph))
tf.summary.scalar('action_probability', tf.reduce_mean(self.mu_ph))

if self.full_tensorboard_log:
tf.summary.histogram('rewards', self.reward_ph)
tf.summary.histogram('learning_rate', self.learning_rate)
tf.summary.histogram('advantage', adv)
tf.summary.histogram('action_probabilty', self.mu_ph)
tf.summary.histogram('action_probability', self.mu_ph)
if tf_util.is_image(self.observation_space):
tf.summary.image('observation', train_model.obs_ph)
else:
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/acktr/acktr.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class ACKTR(ActorCriticRLModel):
Use `n_cpu_tf_sess` instead.

:param n_steps: (int) The number of steps to run for each environment
:param ent_coef: (float) The weight for the entropic loss
:param ent_coef: (float) The weight for the entropy loss
:param vf_coef: (float) The weight for the loss on the value function
:param vf_fisher_coef: (float) The weight for the fisher loss on the value function
:param learning_rate: (float) The initial learning rate for the RMS prop optimizer
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/acktr/kfac.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def __init__(self, learning_rate=0.01, momentum=0.9, clip_kl=0.01, kfac_update=2
:param clip_kl: (float) gradient clipping for Kullback-Leibler
:param kfac_update: (int) update kfac after kfac_update steps
:param stats_accum_iter: (int) how may steps to accumulate stats
:param full_stats_init: (bool) whether or not to fully initalize stats
:param full_stats_init: (bool) whether or not to fully initialize stats
:param cold_iter: (int) Cold start learning rate for how many steps
:param cold_lr: (float) Cold start learning rate
:param async_eigen_decomp: (bool) Use async eigen decomposition
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/common/atari_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ def __getitem__(self, i):

def make_atari(env_id):
"""
Create a wrapped atari envrionment
Create a wrapped atari Environment

:param env_id: (str) the environment ID
:return: (Gym Environment) the wrapped atari environment
Expand Down
10 changes: 5 additions & 5 deletions stable_baselines/common/base_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,9 +238,9 @@ def _get_pretrain_placeholders(self):
"""
Return the placeholders needed for the pretraining:
- obs_ph: observation placeholder
- actions_ph will be population with an action from the environement
- actions_ph will be population with an action from the environment
(from the expert dataset)
- deterministic_actions_ph: e.g., in the case of a gaussian policy,
- deterministic_actions_ph: e.g., in the case of a Gaussian policy,
the mean.

:return: ((tf.placeholder)) (obs_ph, actions_ph, deterministic_actions_ph)
Expand Down Expand Up @@ -474,7 +474,7 @@ def load(cls, load_path, env=None, custom_objects=None, **kwargs):
Load the model from file

:param load_path: (str or file-like) the saved parameter location
:param env: (Gym Envrionment) the new environment to run the loaded model on
:param env: (Gym Environment) the new environment to run the loaded model on
(can be None if you only need prediction from a trained model)
:param custom_objects: (dict) Dictionary of objects to replace
upon loading. If a variable is present in this dictionary as a
Expand Down Expand Up @@ -862,7 +862,7 @@ def load(cls, load_path, env=None, custom_objects=None, **kwargs):
Load the model from file

:param load_path: (str or file-like) the saved parameter location
:param env: (Gym Envrionment) the new environment to run the loaded model on
:param env: (Gym Environment) the new environment to run the loaded model on
(can be None if you only need prediction from a trained model)
:param custom_objects: (dict) Dictionary of objects to replace
upon loading. If a variable is present in this dictionary as a
Expand Down Expand Up @@ -945,7 +945,7 @@ def load(cls, load_path, env=None, custom_objects=None, **kwargs):
Load the model from file

:param load_path: (str or file-like) the saved parameter location
:param env: (Gym Envrionment) the new environment to run the loaded model on
:param env: (Gym Environment) the new environment to run the loaded model on
(can be None if you only need prediction from a trained model)
:param custom_objects: (dict) Dictionary of objects to replace
upon loading. If a variable is present in this dictionary as a
Expand Down
8 changes: 4 additions & 4 deletions stable_baselines/common/cmd_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def make_vec_env(env_id, n_envs=1, seed=None, start_index=0,

:param env_id: (str or Type[gym.Env]) the environment ID or the environment class
:param n_envs: (int) the number of environments you wish to have in parallel
:param seed: (int) the inital seed for the random number generator
:param seed: (int) the initial seed for the random number generator
:param start_index: (int) start rank index
:param monitor_dir: (str) Path to a folder where the monitor files will be saved.
If None, no file will be written, however, the env will still be wrapped
Expand Down Expand Up @@ -80,7 +80,7 @@ def make_atari_env(env_id, num_env, seed, wrapper_kwargs=None,

:param env_id: (str) the environment ID
:param num_env: (int) the number of environment you wish to have in subprocesses
:param seed: (int) the inital seed for RNG
:param seed: (int) the initial seed for RNG
:param wrapper_kwargs: (dict) the parameters for wrap_deepmind function
:param start_index: (int) start rank index
:param allow_early_resets: (bool) allows early reset of the environment
Expand Down Expand Up @@ -116,7 +116,7 @@ def make_mujoco_env(env_id, seed, allow_early_resets=True):
Create a wrapped, monitored gym.Env for MuJoCo.

:param env_id: (str) the environment ID
:param seed: (int) the inital seed for RNG
:param seed: (int) the initial seed for RNG
:param allow_early_resets: (bool) allows early reset of the environment
:return: (Gym Environment) The mujoco environment
"""
Expand All @@ -132,7 +132,7 @@ def make_robotics_env(env_id, seed, rank=0, allow_early_resets=True):
Create a wrapped, monitored gym.Env for MuJoCo.

:param env_id: (str) the environment ID
:param seed: (int) the inital seed for RNG
:param seed: (int) the initial seed for RNG
:param rank: (int) the rank of the environment (for logging)
:param allow_early_resets: (bool) allows early reset of the environment
:return: (Gym Environment) The robotic environment
Expand Down
Loading