Hindsight Experience Replay (HER) - Reloaded | get/load parameters
Breaking Changes:
- breaking change removed
stable_baselines.ddpg.memory
in favor ofstable_baselines.deepq.replay_buffer
(see fix below)
Breaking Change: DDPG replay buffer was unified with DQN/SAC replay buffer. As a result, when loading a DDPG model trained with stable_baselines<2.6.0, it throws an import error. You can fix that using:
import sys
import pkg_resources
import stable_baselines
# Fix for breaking change for DDPG buffer in v2.6.0
if pkg_resources.get_distribution("stable_baselines").version >= "2.6.0":
sys.modules['stable_baselines.ddpg.memory'] = stable_baselines.deepq.replay_buffer
stable_baselines.deepq.replay_buffer.Memory = stable_baselines.deepq.replay_buffer.ReplayBuffer
We recommend you to save again the model afterward, so the fix won't be needed the next time the trained agent is loaded.
New Features:
- revamped HER implementation: clean re-implementation from scratch, now supports DQN, SAC and DDPG
- add
action_noise
param for SAC, it helps exploration for problem with deceptive reward - The parameter
filter_size
of the functionconv
in A2C utils now supports passing a list/tuple of two integers (height and width), in order to have non-squared kernel matrix. (@yutingsz) - add
random_exploration
parameter for DDPG and SAC, it may be useful when using HER + DDPG/SAC. This hack was present in the original OpenAI Baselines DDPG + HER implementation. - added
load_parameters
andget_parameters
to base RL class. With these methods, users are able to load and get parameters to/from existing model, without touching tensorflow. (@Miffyli) - added specific hyperparameter for PPO2 to clip the value function (
cliprange_vf
) - added
VecCheckNan
wrapper
Bug Fixes:
- bugfix for
VecEnvWrapper.__getattr__
which enables access to class attributes inherited from parent classes. - fixed path splitting in
TensorboardWriter._get_latest_run_id()
on Windows machines (@PatrickWalter214) - fixed a bug where initial learning rate is logged instead of its placeholder in
A2C.setup_model
(@sc420) - fixed a bug where number of timesteps is incorrectly updated and logged in
A2C.learn
andA2C._train_step
(@sc420) - fixed
num_timesteps
(total_timesteps) variable in PPO2 that was wrongly computed. - fixed a bug in DDPG/DQN/SAC, when there were the number of samples in the replay buffer was lesser than the batch size (thanks to @dwiel for spotting the bug)
- removed
a2c.utils.find_trainable_params
please usecommon.tf_util.get_trainable_vars
instead.find_trainable_params
was returning all trainable variables, discarding the scope argument. This bug was causing the model to save duplicated parameters (for DDPG and SAC) but did not affect the performance.
Deprecations:
- deprecated
memory_limit
andmemory_policy
in DDPG, please usebuffer_size
instead. (will be removed in v3.x.x)
Others:
- important change switched to using dictionaries rather than lists when storing parameters, with tensorflow Variable names being the keys. (@Miffyli)
- removed unused dependencies (tdqm, dill, progressbar2, seaborn, glob2, click)
- removed
get_available_gpus
function which hadn't been used anywhere (@Pastafarianist)
Documentation:
- added guide for managing
NaN
andinf
- updated ven_env doc
- misc doc updates