Practice for deep reinforcement learning algorithms by a starter.
Test environment is Gym-CartPolev0 for discrete action space and Gym-PendulmV0 for continuous action space.
Under active development.
Including:DQN, REINFORCE, baseline-REINFORCE, Actor-Critic, Double DQN, Dueling DQN, Sarsa, DDPG, DDPG for discrete action space, A2C, A3C, TD3, SAC, TRPO
1.DQN
2.REINFORCE
1.experience replay
1.baseline-REINFORCE
2.Actor-Critic
Add CUDA support
1.Double DQN
2.Dueling DQN
1.Sarsa
1.DDPG
2.DDPG for discrete action space using gumbel softmax
1.A2C
1.A3C
1.TD3
2.SAC
1.TRPO(Natural Policy gradient).
Unknown bug exists: Hessian matrix may not be positive definite at the beginning of training(But the training will usually converge)