Release sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos · Stable-Baselines-Team/stable-baselines3-contrib

Breaking changes:

Upgraded to Stable-Baselines3 >= 1.6.0
Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
Renamed rollout/exploration rate key to rollout/exploration_rate for QRDQN (to be consistent with SB3 DQN)
Upgraded to python 3.7+ syntax using pyupgrade
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with TQC,
share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

Fixed a bug in RecurrentPPO when calculating the masked loss functions (@rnederstigt)
Fixed a bug in TRPO where kl divergence was not implemented for MultiDiscrete space