Triple-Q

Author's implementation of the paper:

Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation

In this paper we proposed the ﬁrst model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation.

A Tabular Case

In the tabular case we evaluated our algorithm using a grid-world environment.

Train Triple-Q on this environment by simply running the file Triple_Q_tabular.ipynb on Google Colab.

Deep-Triple-Q

The codes for Deep-Triple-Q are adapted from Safety Starter Agent and WCSAC.

Triple-Q can also be implemented with neural network approximations and the actor-critic method.

Train Deep-Triple-Q on the Dynamic Gym benchmark (DynamicEnv) (Yang et al. (2021)) by simply running

python ./deep_tripleq/sac/triple_q.py --env 'DynamicEnv-v0' -s 1234 --cost_lim 15 --logger_kwargs_str '{"output_dir":"./temp"}'

Warning: If you want to use the Triple-Q algorithm in Safety Gym, make sure to install Safety Gym according to the instructions on the Safety Gym repo.

Deep-Triple-Q on safe RL with hard constraints

Train Deep-Triple-Q on Pendulum environment with hard safety constraints (details can be found in Cheng et al. ) by running

python ./saferl/main_triple_q.py --seed 1234

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Triple-Q

A Tabular Case

Deep-Triple-Q

Files

README.md

Latest commit

History

README.md

File metadata and controls

Triple-Q

A Tabular Case

Deep-Triple-Q