SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements
Warning
Shared layers in MLP policy (mlp_extractor
) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior ofnet_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.
Note
TRPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue # 1233
Breaking Changes:
- Removed deprecated
create_eval_env
,eval_env
,eval_log_path
,n_eval_episodes
andeval_freq
parameters,
please use anEvalCallback
instead - Removed deprecated
sde_net_arch
parameter - Upgraded to Stable-Baselines3 >= 1.7.0
New Features:
- Introduced mypy type checking
- Added support for Python 3.10
- Added
with_bias
parameter toARSPolicy
- Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
- Features extractors now properly support unnormalized image-like observations (3D tensor)
when passingnormalize_images=False
Bug Fixes:
- Fixed a bug in
RecurrentPPO
where the lstm states where incorrectly reshaped forn_lstm_layers > 1
(thanks @kolbytn) - Fixed
RuntimeError: rnn: hx is not contiguous
while predicting terminal values forRecurrentPPO
whenn_lstm_layers > 1
Deprecations:
- You should now explicitely pass a
features_extractor
parameter when callingextract_features()
- Deprecated shared layers in
MlpExtractor
(@AlexPasqua)
Others:
- Fixed flake8 config
- Fixed
sb3_contrib/common/utils.py
type hint - Fixed
sb3_contrib/common/recurrent/type_aliases.py
type hint - Fixed
sb3_contrib/ars/policies.py
type hint - Exposed modules in
__init__.py
with__all__
attribute (@ZikangXiong) - Removed ignores on Flake8 F401 (@ZikangXiong)
- Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device
- Standardized the use of
from gym import spaces