This is my personal practice of implementing various algorithms of RL from scratch.
Most of them will be in jupyter notebook and some of them involving multiprocess would be in normal python files.
The framework will always be PyTorch, as a personal practice too.
Normally I use cartpole for easy algorithms in this project and I skip the visual input part. (which is quite trivial if you add few conv layers).
And for harder and visual-related algorithms I will pick various atari game as my environment.
Due to time limit, I will not provide systematic analysis to any particular algorithm. And be aware these are personal usage so bugs do appear frequently.
If the project is mature, I will accept open issues. For now, however, let me dive in. (I guess no one even read this repo though)
Project file structure will be changed continuously to match my needs.
- REINFORCE
- Off-Policy REINFORCE
- Basic Actor Critic
- Advantage Actor Critic using Huber loss and Entropy
- A3C
- A2C
- DDPG
- D4PG
- MADDPG
- TRPO
- PPO
- ACER
- ACTKR
- SAC
- SAC with AAT(Automatically Adjusted Temperature
- TD3
- SVPG
- IMPALA
- Dueling DDQN
- Dueling DDQN + PER
- Rainbow DQN
- Ape-X
- C51
- QR-DQN
- IQN
- Dopamine (DQN + C51 + IQN + Rainbow)
- Q-prop
- Stein Control Variates
- PCL
- Trust-PCL
- PGQL
- Reactor
- IPG
- VIME
- CTS-based Pseudocounts
- PixelCNN-based Pseudocounts
- Hash-based Counts
- EX2
- ICM
- RND
- VIC
- DIAYN
- VALOR