Skip to content

Hindsight Experience Replay (HER) - Reloaded | get/load parameters

Compare
Choose a tag to compare
@araffin araffin released this 13 Jun 10:50
· 106 commits to master since this release

Breaking Changes:

  • breaking change removed stable_baselines.ddpg.memory in favor of stable_baselines.deepq.replay_buffer (see fix below)

Breaking Change: DDPG replay buffer was unified with DQN/SAC replay buffer. As a result, when loading a DDPG model trained with stable_baselines<2.6.0, it throws an import error. You can fix that using:

import sys
import pkg_resources

import stable_baselines

# Fix for breaking change for DDPG buffer in v2.6.0
if pkg_resources.get_distribution("stable_baselines").version >= "2.6.0":
    sys.modules['stable_baselines.ddpg.memory'] = stable_baselines.deepq.replay_buffer
    stable_baselines.deepq.replay_buffer.Memory = stable_baselines.deepq.replay_buffer.ReplayBuffer

We recommend you to save again the model afterward, so the fix won't be needed the next time the trained agent is loaded.

New Features:

  • revamped HER implementation: clean re-implementation from scratch, now supports DQN, SAC and DDPG
  • add action_noise param for SAC, it helps exploration for problem with deceptive reward
  • The parameter filter_size of the function conv in A2C utils now supports passing a list/tuple of two integers (height and width), in order to have non-squared kernel matrix. (@yutingsz)
  • add random_exploration parameter for DDPG and SAC, it may be useful when using HER + DDPG/SAC. This hack was present in the original OpenAI Baselines DDPG + HER implementation.
  • added load_parameters and get_parameters to base RL class. With these methods, users are able to load and get parameters to/from existing model, without touching tensorflow. (@Miffyli)
  • added specific hyperparameter for PPO2 to clip the value function (cliprange_vf)
  • added VecCheckNan wrapper

Bug Fixes:

  • bugfix for VecEnvWrapper.__getattr__ which enables access to class attributes inherited from parent classes.
  • fixed path splitting in TensorboardWriter._get_latest_run_id() on Windows machines (@PatrickWalter214)
  • fixed a bug where initial learning rate is logged instead of its placeholder in A2C.setup_model (@sc420)
  • fixed a bug where number of timesteps is incorrectly updated and logged in A2C.learn and A2C._train_step (@sc420)
  • fixed num_timesteps (total_timesteps) variable in PPO2 that was wrongly computed.
  • fixed a bug in DDPG/DQN/SAC, when there were the number of samples in the replay buffer was lesser than the batch size (thanks to @dwiel for spotting the bug)
  • removed a2c.utils.find_trainable_params please use common.tf_util.get_trainable_vars instead. find_trainable_params was returning all trainable variables, discarding the scope argument. This bug was causing the model to save duplicated parameters (for DDPG and SAC) but did not affect the performance.

Deprecations:

  • deprecated memory_limit and memory_policy in DDPG, please use buffer_size instead. (will be removed in v3.x.x)

Others:

  • important change switched to using dictionaries rather than lists when storing parameters, with tensorflow Variable names being the keys. (@Miffyli)
  • removed unused dependencies (tdqm, dill, progressbar2, seaborn, glob2, click)
  • removed get_available_gpus function which hadn't been used anywhere (@Pastafarianist)

Documentation:

  • added guide for managing NaN and inf
  • updated ven_env doc
  • misc doc updates