To create an autonomous AI agent that can play the Obstacle Tower Challenge game and climb to the highest level possible.
- Install python 3.8.0 in your machine using pyenv
- Fork the repository from here.
- Clone the repositoy from your Github profile
git clone https://github.com/<YOUR_USERNAME>/obstacle-tower-challenge.git
- Run the following commands:
cd obstacle-tower-challenge/
# Set python version for the local folder
pyenv local 3.8.0
# Install pyenv-virtualenv
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
source ~/.bashrc
mkdir venv
cd venv/
pyenv virtualenv 3.8.0 venv
cd ..
# activate virtual environment
pyenv activate venv
# confirm python version
python -V
# Install dependencies
python3 -m pip install --upgrade pip
pip install -r requirements.txt
- Setup jupyter to work with the virtual environment
- By default, the binary will be automatically downloaded when the Obstacle Tower gym is first instantiated. The following line in the Jupyter notebook instantiates the environment:
env = ObstacleTowerEnv(retro=False, realtime_mode=False)
- The binaries for each platform can be separately downloaded at the following links. Using these binaries you can play the game.
You can use Docker to perform a quick setup on a virtual machine. The base image is Docker's Ubuntu Image. The following libraries and packages are installed on the machine as part of Docker quickstart:
- GCC compiler toolset
- Python 3.8 and PIP
- Git
- All other dependencies for this game here
The environment provided has a MultiDiscrete action space (list of valid actions), where the 4 dimensions are: MultiDiscrete([3 3 2 3]) 0. Movement (No-Op/Forward/Back)
- Camera Rotation (No-Op/Counter-Clockwiseorward/Ba/Clockwise)
- Jump (No-Op/Jump)
- Movement (No-Op/Right/Left)
The observation space provided includes a 168x168 image (the camera from the simulation) as well as the number of keys held by the agent (0-5) and the amount of time remaining.
- Random Agent
usage: train.py random [-h] [--max-eps MAX_EPS] [--save-dir SAVE_DIR]
optional arguments:
-h, --help show this help message and exit
--max-eps MAX_EPS Maximum number of episodes (games) to run.
--save-dir SAVE_DIR Directory in which you desire to save the model.
- A3C Agent
usage: train.py a3c [-h] [--lr LR] [--max-eps MAX_EPS] [--update-freq UPDATE_FREQ] [--gamma GAMMA] [--num-workers NUM_WORKERS] [--save-dir SAVE_DIR]
optional arguments:
-h, --help show this help message and exit
--lr LR Learning rate for the shared optimizer.
--max-eps MAX_EPS Maximum number of episodes (games) to run.
--update-freq UPDATE_FREQ
How often to update the global model.
--gamma GAMMA Discount factor of rewards.
--num-workers NUM_WORKERS
Number of workers for asynchronous learning.
--save-dir SAVE_DIR Directory in which you desire to save the model.
- PPO Agent
usage: train.py ppo [-h] [--lr LR] [--max-eps MAX_EPS]
[--update-freq UPDATE_FREQ] [--timesteps TIMESTEPS]
[--batch-size BATCH_SIZE] [--gamma GAMMA]
[--num-workers NUM_WORKERS] [--save-dir SAVE_DIR]
[--plot PLOT]
optional arguments:
-h, --help show this help message and exit
--lr LR Learning rate for the shared optimizer.
--max-eps MAX_EPS Maximum number of episodes (games) to run.
--update-freq UPDATE_FREQ
How often to update the global model.
--timesteps TIMESTEPS
Maximum number of episodes (games) to run.
--batch-size BATCH_SIZE
How often to update the global model.
--gamma GAMMA Discount factor of rewards.
--num-workers NUM_WORKERS
Number of workers for asynchronous learning.
--save-dir SAVE_DIR Directory in which you desire to save the model.
--plot PLOT Plot model results (rewards, loss, etc)
- Curiosity Agent
usage: train.py curiosity [-h] [--lr LR] [--timesteps TIMESTEPS] [--batch-size BATCH_SIZE] [--gamma GAMMA] [--save-dir SAVE_DIR]
optional arguments:
-h, --help show this help message and exit
--lr LR Learning rate for the shared optimizer.
--timesteps TIMESTEPS
Maximum number of episodes (games) to run.
--batch-size BATCH_SIZE
How often to update the global model.
--gamma GAMMA Discount factor of rewards.
--save-dir SAVE_DIR Directory in which you desire to save the model.
- Stable A2C Agent
usage: train.py stable_a2c [-h] [--timesteps TIMESTEPS] [--policy-name POLICY_NAME] [--save-dir SAVE_DIR] [--continue-training]
optional arguments:
-h, --help show this help message and exit
--timesteps TIMESTEPS
Number of timesteps to train the PPO agent for.
--policy-name POLICY_NAME
Policy to train for the PPO agent.
--save-dir SAVE_DIR Directory in which you desire to save the model.
--continue-training Continue training the previously trained model.
- Stable PPO Agent
usage: train.py stable_ppo [-h] [--timesteps TIMESTEPS] [--policy-name POLICY_NAME] [--save-dir SAVE_DIR] [--continue-training] [--reduced-action]
optional arguments:
-h, --help show this help message and exit
--timesteps TIMESTEPS
Number of timesteps to train the PPO agent for.
--policy-name POLICY_NAME
Policy to train for the PPO agent.
--save-dir SAVE_DIR Directory in which you desire to save the model.
--continue-training Continue training the previously trained model.
--reduced-action Use a reduced set of actions for training
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
We have used tf.distribute.MirroredStrategy to explore distributed tensorflow library, and noticed that we can only leverage the utility of this library if we have access to a farm of GPU clusters. Our future work will focus on cloud training, along with experimentation of the following strategies:
- tf.distribute.TPUStrategy
- tf.distribute.MultiWorkerMirroredStrategy
- tf.distribute.experimental.ParameterServerStrategy
- tf.distribute.experimental.CentralStorageStrategy
To train the agent:
python src/train.py --env <PATH_TO_OTC_GAME> <AGENT_NAME> [<ARGS>]
View training logs on Tensorboard:
# to view graphs in tensorboard
tensorboard --logdir logs/
To play a game with a trained agent:
# play an episode of the game using a given policy (random or a3c)
python play.py --env <PATH_TO_OTC_GAME> --algorithm random
# evaluate a given agent
python play.py --env <PATH_TO_OTC_GAME> --algorithm random --evaluate