Self-Driving Racecar with Proximal Policy Optimization

Solving the OpenAI Gym CarRacing-v0 environment using Proximal Policy Optimization.

Demo

See the full video demo on YouTube.

After 5000 training steps, the agent achieves a mean score of 909.48±10.30 over 100 episodes. To reproduce the results, run the following commands:

mkdir logs
python demo.py --ckpt extra/final_weights.pt --delay_ms 0

Results from episodes will be saved to logs/episode_rewards.csv.

A convolutional neural network to jointly approximate the value function and the policy.
Optimization is performed using Proximal Policy Optimization.
Policy network outputs parameters to a Beta distribution, which is better for bounded continuous action spaces.
Advantage estimation is done through the Generalized Advantage Estimation algorithm.
A series of 4 frames are concatenated to form the input to the network, with frame skipping optionally applied.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
extra		extra
games		games
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
demo.py		demo.py
logger.py		logger.py
main.py		main.py
memory.py		memory.py
ppo.py		ppo.py