New algorithm: CrossQ, and better defaults for SAC/TQC on Swimmer-v4 env
- Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results) (@JacobHA) W&B report
- Upgraded to SB3 >= 2.4.0
- Added
CrossQ
hyperparameters for SB3-contrib (@danielpalen)
- Replaced deprecated
huggingface_hub.Repository
when pushing to Hugging Face Hub by the recommendedHfApi
(see https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http) (@cochaviz)
- Updated PyTorch version to 2.4.1 in the CI
- Switched to uv to download packages faster on GitHub CI
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Upgraded to SB3 >= 2.3.0
- Added test dependencies to
setup.py
(@power-edge) - Simplify dependencies of
requirements.txt
(remove duplicates fromsetup.py
)
- Removed
gym
dependency, the package is still required for some pretrained agents. - Upgraded to SB3 >= 2.2.1
- Upgraded to Huggingface-SB3 >= 3.0
- Upgraded to pytablewriter >= 1.0
- Added
--eval-env-kwargs
totrain.py
(@Quentin18) - Added
ppo_lstm
to hyperparams_opt.py (@technocrat13)
- Upgraded to
pybullet_envs_gymnasium>=0.4.0
- Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
- Updated docker image, removed support for X server
- Replaced deprecated
optuna.suggest_uniform(...)
byoptuna.suggest_float(..., low=..., high=...)
- Switched to ruff for sorting imports
- Updated tests to use
shlex.split()
- Fixed
rl_zoo3/hyperparams_opt.py
type hints - Fixed
rl_zoo3/exp_manager.py
type hints
- Dropped python 3.7 support
- SB3 now requires PyTorch 1.13+
- Upgraded to SB3 >= 2.1.0
- Upgraded to Huggingface-SB3 >= 2.3
- Upgraded to Optuna >= 3.0
- Upgraded to cloudpickle >= 2.2.1
- Added python 3.11 support
Gymnasium support
Warning Stable-Baselines3 (SB3) v2.0.0 will be the last one supporting python 3.7
- Fixed bug in HistoryWrapper, now returns the correct obs space limits
- Upgraded to SB3 >= 2.0.0
- Upgraded to Huggingface-SB3 >= 2.2.5
- Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21
- Added Gymnasium support
- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed
CarRacing-v1
toCarRacing-v2
in hyperparameters - Huggingface push to hub now accepts a
--n-timesteps
argument to adjust the length of the video - Fixed
record_video
steps (before it was stepping in a closed env)
New Documentation, Multi-Env HerReplayBuffer
Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
- Upgraded to SB3 >= 1.8.0
- Upgraded to new
HerReplayBuffer
implementation that supports multiple envs - Removed
TimeFeatureWrapper
for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Added hyperparameters pre-trained agents for PPO on 11 MiniGrid envs
- Set
highway-env
version to 1.5 andsetuptools to
v65.5 for the CI - Removed
use_auth_token
for push to hub util - Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
- Fixed
gym-minigrid
policy (fromMlpPolicy
toMultiInputPolicy
)
- Added support for
ruff
(fast alternative to flake8) in the Makefile - Removed Gitlab CI file
- Replaced deprecated
optuna.suggest_loguniform(...)
byoptuna.suggest_float(..., log=True)
- Switched to
ruff
andpyproject.toml
- Removed
online_sampling
andmax_episode_length
argument when usingHerReplayBuffer
SB3 v1.7.0, added support for python config files
--yaml-file
argument was renamed to-conf
(--conf-file
) as now python file are supported too- Upgraded to SB3 >= 1.7.0 (changed
net_arch=[dict(pi=.., vf=..)]
tonet_arch=dict(pi=.., vf=..)
)
- Specifying custom policies in yaml file is now supported (@Rick-v-E)
- Added
monitor_kwargs
parameter - Handle the
env_kwargs
ofrender:True
under the hood for panda-gym v1 envs inenjoy
replay to match visualzation behavior of other envs - Added support for python config file
- Tuned hyperparameters for PPO on Swimmer
- Added
-tags/--wandb-tags
argument totrain.py
to add tags to the wandb run - Added a sb3 version tag to the wandb run
- Allow
python -m rl_zoo3.cli
to be called directly - Fixed a bug where custom environments were not found despite passing
--gym-package
when using subprocesses - Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv
scripts/plot_train.py
plots models such that newer models appear on top of older ones.- Added additional type checking using mypy
- Standardized the use of
from gym import spaces
python3 -m rl_zoo3.train
now works as expected
- Added instructions and examples on passing arguments in an interactive session (@richter43)
- Used issue forms instead of issue templates
- RL Zoo is now a python package
- low pass filter was removed
- Upgraded to Stable-Baselines3 (SB3) >= 1.6.2
- Upgraded to sb3-contrib >= 1.6.2
- Use now built-in SB3
ProgressBarCallback
instead ofTQDMCallback
- RL Zoo cli:
rl_zoo3 train
andrl_zoo3 enjoy
Progress bar and custom yaml file
- Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
- Upgraded to sb3-contrib >= 1.6.1
- Added
--yaml-file
argument option fortrain.py
to read hyperparameters from custom yaml files (@JohannesUl)
- Added
custom_object
parameter on record_video.py (@Affonso-Gui) - Changed
optimize_memory_usage
toFalse
for DQN/QR-DQN on record_video.py (@Affonso-Gui) - In
ExperimentManager
_maybe_normalize
settraining
toFalse
for eval envs, to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani). - Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
- Added progress bar via the
-P
argument using tqdm and rich
RecurrentPPO (ppo_lstm) and Huggingface integration
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
- Updated default --eval-freq from 10k to 25k steps
- Update default horizon to 2 for the
HistoryWrapper
- Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
- Upgrade to sb3-contrib >= 1.6.0
- Support setting PyTorch's device with thye
--device
flag (@gregwar) - Add
--max-total-trials
parameter to help with distributed optimization. (@ernestum) - Added
vec_env_wrapper
support in the config (works the same asenv_wrapper
) - Added Huggingface hub integration
- Added
RecurrentPPO
support (akappo_lstm
) - Added autodownload for "official" sb3 models from the hub
- Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
- Added MsPacman models
- Fix
Reacher-v3
name in PPO hyperparameter file - Pinned ale-py==0.7.4 until new SB3 version is released
- Fix enjoy / record videos with LSTM policy
- Fix bug with environments that have a slash in their name (@ernestum)
- Changed
optimize_memory_usage
toFalse
for DQN/QR-DQN on Atari games, if you want to save RAM, you need to deactivatehandle_timeout_termination
in thereplay_buffer_kwargs
- When pruner is set to
"none"
, useNopPruner
instead of divertedMedianPruner
(@qgallouedec)
Support for Weight and Biases experiment tracking
- Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
- Upgrade to sb3-contrib >= 1.5.0
- Upgraded to gym 0.21
- Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
- Support experiment tracking via Weights and Biases via the
--track
flag (@vwxyzjn) - Support tracking raw episodic stats via
RawStatisticsCallback
(@vwxyzjn, see #216)
- Policies saved during during optimization with distributed Optuna load on new systems (@jkterry)
- Fixed script for recording video that was not up to date with the enjoy script
- Dropped python 3.6 support
- Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
- Upgrade to sb3-contrib >= 1.4.0
- Added mujoco hyperparameters
- Added MuJoCo pre-trained agents
- Added script to parse best hyperparameters of an optuna study
- Added TRPO support
- Added ARS support and pre-trained agents
- Replace front image
rliable plots and bug fixes
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
- Upgrade to panda-gym 1.1.1
- Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
- Upgrade to sb3-contrib >= 1.3.0
- Added support for using rliable for performance comparison
- Fix training with Dict obs and channel last images
- Updated docker image
- constrained gym version: gym>=0.17,<0.20
- Better hyperparameters for A2C/PPO on Pendulum
- Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
- Upgrade to sb3-contrib >= 1.2.0
- Added support for Python 3.10
- Fix
--load-last-checkpoint
(@SammyRamone) - Fix
TypeError
forgym.Env
class entry points inExperimentManager
(@schuderer) - Fix usage of callbacks during hyperparameter optimization (@SammyRamone)
- Added python 3.9 to Github CI
- Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)
- Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
- Upgrade to sb3-contrib >= 1.1.0
- Add timeout handling (cf SB3 doc)
HER
is now a replay buffer class and no more an algorithm- Removed
PlotNoiseRatioCallback
- Removed
PlotActionWrapper
- Changed
'lr'
key in Optuna param dict to'learning_rate'
so the dict can be directly passed to SB3 methods (@jkterry)
- Add support for recording videos of best models and checkpoints (@mcres)
- Add support for recording videos of training experiments (@mcres)
- Add support for dictionary observations
- Added experimental parallel training (with
utils.callbacks.ParallelTrainCallback
) - Added support for using multiple envs for evaluation
- Added
--load-last-checkpoint
option for the enjoy script - Save Optuna study object at the end of hyperparameter optimization and plot the results (
plotly
package required) - Allow to pass multiple folders to
scripts/plot_train.py
- Flag to save logs and optimal policies from each training run (@jkterry)
- Fixed video rendering for PyBullet envs on Linux
- Fixed
get_latest_run_id()
so it works in Windows too (@NicolasHaeffner) - Fixed video record when using
HER
replay buffer
- Updated README (dict obs are now supported)
- Added
is_bullet()
toExperimentManager
- Simplify
close()
for the enjoy script - Updated docker image to include latest black version
- Updated TD3 Walker2D model (thanks @modanesh)
- Fixed typo in plot title (@scottemmons)
- Minimum cloudpickle version added to
requirements.txt
(@amy12xx) - Fixed atari-py version (ROM missing in newest release)
- Updated
SAC
andTD3
search spaces - Cleanup eval_freq documentation and variable name changes (@jkterry)
- Add clarifying print statement when printing saved hyperparameters during optimization (@jkterry)
- Clarify n_evaluations help text (@jkterry)
- Simplified hyperparameters files making use of defaults
- Added new TQC+HER agents
- Add
panda-gym
environments (@qgallouedec)
- Upgrade to SB3 >= 1.0
- Upgrade to sb3-contrib >= 1.0
- Added 100+ trained agents + benchmark file
- Add support for loading saved model under python 3.8+ (no retraining possible)
- Added Robotics pre-trained agents (@sgillen)
- Bug fixes for
HER
handling action noise - Fixed double reset bug with
HER
and enjoy script
- Added doc about plotting scripts
- Updated
HER
hyperparameters
- Removed
LinearNormalActionNoise
- Evaluation is now deterministic by default, except for Atari games
sb3_contrib
is now requiredTimeFeatureWrapper
was moved to the contrib repo- Replaced old
plot_train.py
script with updatedplot_training_success.py
- Renamed
n_episodes_rollout
totrain_freq
tuple to match latest version of SB3
- Added option to choose which
VecEnv
class to use for multiprocessing - Added hyperparameter optimization support for
TQC
- Added support for
QR-DQN
from SB3 contrib
- Improved detection of Atari games
- Fix potential bug in plotting script when there is not enough timesteps
- Fixed a bug when using HER + DQN/TQC for hyperparam optimization
- Improved documentation (@cboettig)
- Refactored train script, now uses a
ExperimentManager
class - Replaced
make_env
with SB3 built-inmake_vec_env
- Add more type hints (
utils/utils.py
done) - Use f-strings when possible
- Changed
PPO
atari hyperparameters (removed vf clipping) - Changed
A2C
atari hyperparameters (eps value of the optimizer) - Updated benchmark script
- Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
- Updated
DQN
hyperparameters for CartPole - Do not wrap channel-first image env (now natively supported by SB3)
- Removed hack to log success rate
- Simplify plot script
- Added support for
HER
- Added low-pass filter wrappers in
utils/wrappers.py
- Added
TQC
support, implementation from sb3-contrib
- Fixed
TimeFeatureWrapper
inferring max timesteps - Fixed
flatten_dict_observations
inutils/utils.py
for recent Gym versions (@ManifoldFR) VecNormalize
now takesgamma
hyperparameter into account- Fix loading of
VecNormalize
when continuing training or using trained agent
- Added tests for the wrappers
- Updated plotting script
- Distributed optimization (@SammyRamone)
- Added
--load-checkpoints
to load particular checkpoints - Added
--num-threads
to enjoy script - Added DQN support
- Added saving of command line args (@SammyRamone)
- Added DDPG support
- Added version
- Added
RMSpropTFLike
support
- Fixed optuna warning (@SammyRamone)
- Fixed
--save-freq
which was not taking parallel env into account - Set
buffer_size
to 1 when testing an Off-Policy model (e.g. SAC/DQN) to avoid memory allocation issue - Fixed seed at load time for
enjoy.py
- Non-deterministic eval when doing hyperparameter optimization on atari games
- Use 'maximize' for hyperparameter optimization (@SammyRamone)
- Fixed a bug where reward where not normalized when doing hyperparameter optimization (@caburu)
- Removed
nminibatches
fromppo.yml
forMountainCar-v0
andAcrobot-v1
. (@blurLake) - Fixed
--save-replay-buffer
to be compatible with latest SB3 version - Close environment at the end of training
- Updated DQN hyperparameters on simpler gym env (due to an update in the implementation)
- Reformat
enjoy.py
,test_enjoy.py
,test_hyperparams_opt.py
,test_train.py
,train.py
,callbacks.py
,hyperparams_opt.py
,utils.py
,wrappers.py
(@salmannotkhan) - Reformat
record_video.py
(@salmannotkhan) - Added codestyle check
make lint
using flake8 - Reformat
benchmark.py
(@salmannotkhan) - Added github ci
- Fixes most linter warnings
- Now using black and isort for auto-formatting
- Updated plots