-
Notifications
You must be signed in to change notification settings - Fork 81
Home
muupan edited this page May 10, 2016
·
3 revisions
I received a confirmation by e-mail from Dr. Mnih:
- On optimization
- They use the exact RMSprop represented by the equations (8) and (9)
- The RMSprop parameters they used are: eta=7e-4, epsilon=0.1, alpha=0.99
- They linearly decrease eta to zero in the course of training
- They keep only single RMSprop 'g' while summing up the gradients of pi and V
- They multiply the gradients of V by 0.5
- They didn't clip losses
- They ran it 320 million frames (= 80 million non-skipped frames) for one-day results, 1 billion frames for four-day results
- On networks
- Pi and V share the network except the last layers
- They initialized parameters with default Torch initialization: https://github.com/torch/nn/blob/master/Linear.lua
- On Atari
- They clipped rewards so that they are in [-1, 1]