Release Bug fixes release · hill-a/stable-baselines

Breaking Changes:

render() method of VecEnvs now only accept one argument: mode

New Features:

Added momentum parameter to A2C for the embedded RMSPropOptimizer (@kantneel)
ActionNoise is now an abstract base class and implements __call__, NormalActionNoise and OrnsteinUhlenbeckActionNoise have return types (@partiallytyped)
HER now passes info dictionary to compute_reward, allowing for the computation of rewards that are independent of the goal (@tirafesi)

Bug Fixes:

Fixed DDPG sampling empty replay buffer when combined with HER (@tirafesi)
Fixed a bug in HindsightExperienceReplayWrapper, where the openai-gym signature for compute_reward was not matched correctly (@johannes-dornheim)
Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@partiallytyped)
Added **kwarg pass through for reset method in atari_wrappers.FrameStack (@partiallytyped)
Fix consistency in setup_model() for SAC, target_entropy now uses self.action_space instead of self.env.action_space (@partiallytyped)
Fix reward threshold in test_identity.py
Partially fix tensorboard indexing for PPO2 (@Enderdead)
Fixed potential bug in DummyVecEnv where copy() was used instead of deepcopy()
Fixed a bug in GAIL where the dataloader was not available after saving, causing an error when using CheckpointCallback
Fixed a bug in SAC where any convolutional layers were not included in the target network parameters.
Fixed render() method for VecEnvs
Fixed seed()``` method for SubprocVecEnv``
Fixed a bug callback.locals did not have the correct values (@partiallytyped)
Fixed a bug in the close() method of SubprocVecEnv, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)
Fixed a bug in the generate_expert_traj() method in record_expert.py when using a non-image vectorized environment (@jbarsce)
Fixed a bug in CloudPickleWrapper's (used by VecEnvs) __setstate___ where loading was incorrectly using pickle.loads (@shwang).
Fixed a bug in SAC and TD3 where the log timesteps was not correct(@YangRui2015)
Fixed a bug where the environment was reset twice when using evaluate_policy

Others:

Added version.txt to manage version number in an easier way
Added .readthedocs.yml to install requirements with read the docs
Added a test for seeding ``SubprocVecEnv``` and rendering

Documentation:

Fix typos (@caburu)
Fix typos in PPO2 (@kvenkman)
Removed stable_baselines\deepq\experiments\custom_cartpole.py (@aakash94)
Added Google's motion imitation project
Added documentation page for monitor
Fixed typos and update VecNormalize example to show normalization at test-time
Fixed train_mountaincar description
Added imitation baselines project
Updated install instructions
Added Slime Volleyball project (@hardmaru)
Added a table of the variables accessible from the on_step function of the callbacks for each algorithm (@partiallytyped)
Fix typo in README.md (@ColinLeongUDRI)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes release

Breaking Changes:

New Features:

Bug Fixes:

Others:

Documentation: