Skip to content

sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos

Compare
Choose a tag to compare
@araffin araffin released this 12 Jul 21:14
· 73 commits to master since this release
087951d

Breaking changes:

  • Upgraded to Stable-Baselines3 >= 1.6.0
  • Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
    register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
  • Renamed rollout/exploration rate key to rollout/exploration_rate for QRDQN (to be consistent with SB3 DQN)
  • Upgraded to python 3.7+ syntax using pyupgrade
  • SB3 now requires PyTorch >= 1.11
  • Changed the default network architecture when using CnnPolicy or MultiInputPolicy with TQC,
    share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

New Features

  • Added RecurrentPPO (aka PPO LSTM)

Bug Fixes:

  • Fixed a bug in RecurrentPPO when calculating the masked loss functions (@rnederstigt)
  • Fixed a bug in TRPO where kl divergence was not implemented for MultiDiscrete space