GitHub

Open AI Gym solution and other environments

Cartpole V0

I have solved this problem with a DQN algorithm using 2 neural networks to compute the Q_values with prioritized experience replay.

The agent solves the problem in about 20-25 epsiodes, meaning it does not make any mistakes after that. However, the formal definition of solved is when the agent gets an average reward of 195 the last 100 epsiodes. This occurs around epsisode nb. 116

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agents.py		agents.py
cartpole.py		cartpole.py
lunarlander_v2.py		lunarlander_v2.py
models.py		models.py
mountain_car_v0.py		mountain_car_v0.py
readme.md		readme.md
replay_buffer.py		replay_buffer.py
taxi_v2.py		taxi_v2.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open AI Gym solution and other environments

Cartpole V0

About

Releases

Packages

Contributors 2

Languages

Roaldb86/Reinforcement_learning

Folders and files

Latest commit

History

Repository files navigation

Open AI Gym solution and other environments

Cartpole V0

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages