A random agent | A trained agent |
---|---|
![]() |
![]() |
Examples of a random agent and an agent trained to position its arm within the (green) target region.
This project implements Deep Deterministic Policy Gradient (DDPG) [1] to solve the "Reacher" Unity ML Environment. The goal in this environment is to train an agent that's capable of maneuvering a robotic arm such that its hand is located within the target region (seen as a green sphere in the GIFs above). This project was completed as part of the (Unity-sponsored) Udacity course on Deep Reinforcement Learning.
33 real-valued variables describing the position, rotation, velocity, and angular velocities of the arm.
A float vector of size 4. The arm contains 2 joints, and each joint can be moved with 2 torque values. (Every entry is bound between [-1, +1].)
The agent receives a reward of +0.1 for every time step the hand is within the desired region.
The environment is considered solved when the agent can receive an average reward of +30 or more, over a window of 100 episodes.
Based on the Udacity setup (see here), Conda/virtualenv can be used to install the required dependencies. For example:
virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements.txt
The environment executable can be downloaded for different platforms.
A document detailing some of the implementation details and ideas for future work.
Contains a CLI used for training and visualizing the model.
The main module.
Contains definitions of the Policy/Q-Net (Actor/Critic) models.
Contains functions for training and visualizing agents.
Various utilities for managing the environment and training loop.
Contains pretrained models.
A pre-trained Policy Net (Actor).
A pre-trained Q-Net (Critic).
The main.py
script can be used to train agents and visualize them.
To train:
python3 main.py train
To visualize:
python3 main.py visualize models/policy_net.pth