Deep RL algorithms for OpenAI gym's environments
See Heerad's submissions here
Implemented:
- Actor-critic with per-step updates using eligibiilty traces
- Deep Q-learning (DQN) with experience replay to improve sample efficiency
- DDPG for continuous action spaces
- UCB exploration based on Hoeffding's inequality as an alternative to epsilon-greedy exploration for DQN
- Double Q-learning for eliminating maximization bias from applying function approximators to Q-learning
- Prioritized experience replay for DQN
- Slowly-updating target network (used in computing TD error) for stability
- Norm clipping for stability
TODO:
- Atari environments via convnets
- PPO