The purpose of this repository is to implement the Deep Deterministic Policy Gradient algorithm or DDPG in a distributed fashion as proposed here.
I will start by evaluating the performance of DDPG in simple cases and then comparing this performance when distributing the training process among several "workers".
I evaluated the performance of the standard DDPG approach on the MountainCarContinuous task. The figure below shows the training curves until the problem is considered solved.
The provided results were obtained by running a single worker. To replicate the results run the following commands in two different consoles:
# Parameter server
python ddpg.py --job_name="ps" --task_index=0
# First worker
python ddpg.py --job_name="worker" --task_index=0
To visualize the training process using TensorBoard:
# TensorBoard
tensorboard --logdir=results/tboard_ddpg/