Author's implementation of the paper:
Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
In this paper we proposed the first model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation.
In the tabular case we evaluated our algorithm using a grid-world environment.
Train Triple-Q on this environment by simply running the file Triple_Q_tabular.ipynb
on Google Colab.
The codes for Deep-Triple-Q are adapted from Safety Starter Agent and WCSAC.
Triple-Q can also be implemented with neural network approximations and the actor-critic method.
Train Deep-Triple-Q on the Dynamic Gym benchmark (DynamicEnv) (Yang et al. (2021)) by simply running
python ./deep_tripleq/sac/triple_q.py --env 'DynamicEnv-v0' -s 1234 --cost_lim 15 --logger_kwargs_str '{"output_dir":"./temp"}'
Warning: If you want to use the Triple-Q algorithm in Safety Gym, make sure to install Safety Gym according to the instructions on the Safety Gym repo.
Deep-Triple-Q on safe RL with hard constraints
Train Deep-Triple-Q on Pendulum environment with hard safety constraints (details can be found in Cheng et al. ) by running
python ./saferl/main_triple_q.py --seed 1234