This repository provides the implementation of autoregressive policies (ARPs) for continuous control deep reinforcement learning together with learning examples based on Open AI Baselines PPO and TRPO algorithms. The examples are provided for OpenAI Gym Mujoco environments and for Square sparse reward environment, discussed in the paper.
Tensorflow >= 1.12, OpenAI Baselines and OpenAI Gym are required to run learning examples. NumPy only is required to build and plot stationary AR processes.
- To generate and plot noise trajectories based on AR processes at different orders and smoothing parameter values
python ./examples/make_noise.py
- To run ARP with OpenAI Baselines PPO on a Square environment
python ./examples/run_square_ppo.py --dt 0.1 --p 3 --alpha 0.8 --num-timesteps=500000
- To run ARP with OpenAI Baselines PPO on a Mujoco environment
python ./examples/run_mujoco_ppo.py --env Reacher-v2 --p 3 --alpha 0.5 --num-timesteps=1000000
- To run ARP with OpenAI Baselines TRPO on a Mujoco environment
python ./examples/run_mujoco_trpo.py --env Reacher-v2 --p 3 --alpha 0.5 --num-timesteps=1000000
Autoregressive Policies for Continuous Control Deep Reinforcement Learning.
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra. arXiv preprint, 2019.
paper | video