This project includes the code for an extended version of the Deep Q-Learning algorithm which I wrote during my Deep Reinforcement Learning Nanodegree @ Udacity. The code is inspired by the vanilla DQN algorithm provided by Udacity.
Deep Q-Learning for Multilayer Perceptron
+ Fixed Q-Targets
+ Experience Replay
+ Gradient Clipping
+ Double Deep Q-Learning
+ Dueling Networks
For more information on the implemented features refer to Extended_Deep_Q_Learning_for_Multilayer_Perceptron.ipynb. The notebook includes a summary of all essential concepts used in the code. It also contains three examples where the algorithm is used to solve Open AI gym environments.
- Create (and activate) a new environment with Python 3.6.
conda create --name env_name python=3.6
source activate env_name
- Install OpenAi Gym
git clone https://github.com/openai/gym.git
cd gym
pip install -e .
pip install -e '.[box2d]'
pip install -e '.[classic_control]'
sudo apt-get install ffmpeg
- Install Sourcecode dependencies
conda install -c rpi matplotlib
conda install -c pytorch pytorch
conda install -c anaconda numpy
You can run the project via Extended_Deep_Q_Learning_for_Multilayer_Perceptron.ipynb or running the main.py file through the console.
open the console and run: python main.py -c "your_config_file".json optional arguments:
-h, --help
- show help message
-c , --config
- Config file name - file must be available as .json in ./configs
Example: python main.py -c "Lunar_Lander_v2".json
"general" :
"env_name" : "LunarLander-v2", # The gym environment name you want to run
"monitor_dir" : ["monitor"], # monitor file direction
"checkpoint_path": ["checkpoints"], # checkpoint file direction
"seed": 0, # random seed for numpy, gym and pytorch
"state_size" : 8, # number of states
"action_size" : 4, # number of actions
"average_score_for_solving" : 200.0 # border value for solving the task
"train" :
"nb_episodes": 2000, # max number of episodes
"episode_length": 1000, # max length of one episode
"batch_size" : 256, # memory batch size
"epsilon_high": 1.0, # epsilon start point
"epsilon_low": 0.01, # min epsilon value
"epsilon_decay": 0.995, # epsilon decay
"run_training" : true # do you want to train? Otherwise run a test session
"agent" :
"learning_rate": 0.0005, # model learning rate
"gamma" : 0.99, # reward weight
"tau" : 0.001, # soft update factor
"update_rate" : 4 # interval in which a learning step is done
"buffer" :
"size" : 100000 # experience replay buffer size
"model" :
"fc1_nodes" : 256, # number of fc1 output nodes
"fc2_adv" : 256, # number of fc2_adv output nodes
"fc2_val" : 128 # number of fc2_val output nodes