Overview

Reinforcement Learning agents for playing flappy-bird game

PPO (Proximal Policy Optimization)

Deep reinforcement learning using a PPO model written in Tensorflow.
The Flappy Bird Gym environment was mastered in 9 hours of training using a CPU with 8 cores and a GPU. Training and evaluation code is available at dppo.ipynb
Getting all 264.0 over 200 consecutive trials.

gym 0.10.5
ple - https://github.com/ntasfi/PyGame-Learning-Environment
gym_ple - https://github.com/lusob/gym-ple
tensorflow 1.12