You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Breaking changes:
Upgraded to Stable-Baselines3 >= 1.6.0
Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
Renamed rollout/exploration rate key to rollout/exploration_rate for QRDQN (to be consistent with SB3 DQN)
Upgraded to python 3.7+ syntax using pyupgrade
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with TQC, share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)
New Features
Added RecurrentPPO (aka PPO LSTM)
Bug Fixes:
Fixed a bug in RecurrentPPO when calculating the masked loss functions (@rnederstigt)
Fixed a bug in TRPO where kl divergence was not implemented for MultiDiscrete space