PyTorch A2C/PPO for Obstacle Tower Challenge

Adapted from my A2C/PPO experiments for Pommerman, this was the basis for some experiments with actor-critic PG algorithms for the Obstacle Tower Challenge (https://github.com/Unity-Technologies/obstacle-tower-challenge)

The reinforcement learning codebase is based upon Ilya Kostrikov's awesome work (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)

Changes

In short:

Add Noisy Networks to replace entropy regularization for exploration
Replace GRU RNN policy with LSTM

My initial attempts with A2C and PPO went nowhere. After training for days, average floor remained at 1 with little progress made by either algo. Seeing the success of my experiments with Rainbow (https://github.com/rwightman/obstacle-towers-pytorch-rainbow), I decided to bring in Noisy Networks from (https://github.com/Kaixhin/Rainbow) to improve exploration. Additionally I replaced the GRU with LSTM.

This resulted in some progress. Floor average around 6-7 was reached, not quite as good as my Rainbow experiments. Top floor of 9-10 was hit on occasion but also a lot of failures to move past floor 2 with the same policy.

Minimal time was spent searching the hyper-param space, I'm sure much better could be achieved.

Usage

Setup a new Conda Python 3.6 environment (do not use 3.7! compatibility issues with Unity's support modules)
Install recent (ver 1.x) of PyTorch
Setup environment download engine as per: https://github.com/Unity-Technologies/obstacle-tower-challenge#local-setup-for-training but using this repo in place of that clone and do it within the same Conda env
Run python main.py --env-name obt --algo ppo --use-gae --recurrent-policy --num-processes 32 --num-mini-batch 8 --num-steps 120 --entropy-coef 0.0 --lr 1e-4 --clip-param 0.1 and wait...
enjoy.py can be used to watch the trained policy in realtime

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
algo		algo
envs		envs
helpers		helpers
imgs		imgs
models		models
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
distributions.py		distributions.py
enjoy.py		enjoy.py
main.py		main.py
replay_storage.py		replay_storage.py
requirements.txt		requirements.txt
rollout_storage.py		rollout_storage.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch A2C/PPO for Obstacle Tower Challenge

Changes

Usage

About

Releases

Packages

Languages

License

rwightman/obstacle-tower-pytorch-a2c-ppo

Folders and files

Latest commit

History

Repository files navigation

PyTorch A2C/PPO for Obstacle Tower Challenge

Changes

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages