Releases: prabhatnagarajan/repro_dqn
V1 - Deterministic DQN
This release contains a high-quality deterministic implementation of Deep Q-learning. This implementation was used in the following works:
- The Impact of Nondeterminism on Reproducibility in Deep Reinforcement Learning, a workshop paper at ICML 2018
- Deterministic Implementations for Reproducibility in Deep Reinforcement Learning an arXiv preprint, and
- Nondeterminism as a Reproducibility Challenge for Deep Reinforcement Learning, a master's thesis from UT.
The Impact of Nondeterminism on Reproducibility in Deep Reinforcement Learning and Nondeterminism as a Reproducibility Challenge for Deep Reinforcement Learning
For these works, we used the following hyperparameters and experimental conditions
Hyperparameters
The default hyperparameters for the deterministic implementation are specified in constants.py
. The exploration and initialization seeds used were 125, 127, 129, 131, and 133. The remainder of the hyperparameters are (these parameters are explained in the DQN paper):
- Network architecture: We used the architecture from the Nips paper.
- Timesteps - 10 million
- Minibatch size - 32
- Agent history length - 4
- Replay Memory Size - 1 million
- Target network update frequency - 10000
- Discount Factor - 0.99
- Action Repeat - 4
- Update Frequency - 2
- Learning rate - 0.0001
- Initial Exploration - 1.0
- Final Exploration - 0.1
- Final Exploration Frame - 1 million
- Replay Start Size - 50000
- No-op max - 30
- Death ends episode (during training, end episode with loss of life) - False
Experimental Conditions
All of our experiments were performed under the same hardware and software conditions in AWS.
- AMI - Deep Learning Base AMI (Ubuntu) Version 4.0
- Instance Type - GPU Compute, p2.xlarge
- Software - Installed using the install script. All other software comes with the AMI.
Unfortunately, it appears that AWS continuously updates its software, and there is no guarantee that the AMI will always be available. As such, it might not be able to replicate our results exactly. However, the deterministic implementation should function correctly under most AMIs available on AWS.