Skip to content

Roaldb86/Reinforcement_learning

Repository files navigation

Open AI Gym solution and other environments

Cartpole V0

I have solved this problem with a DQN algorithm using 2 neural networks to compute the Q_values with prioritized experience replay.

The agent solves the problem in about 20-25 epsiodes, meaning it does not make any mistakes after that. However, the formal definition of solved is when the agent gets an average reward of 195 the last 100 epsiodes. This occurs around epsisode nb. 116

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages