-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRAD_MOMENTUM not used by RMSProp in DQN #37
Conversation
I believe this is actually intentional, although very poorly documented. I'm a bit rusty on the context, but take a look at the original DQN source code here: https://github.com/google-deepmind/dqn/blob/master/dqn/NeuralQLearner.lua. They actually implement something like centred RMSProp without momentum. I believe in the nature version of the DQN paper they have parameters referred to as squared gradient momentum and gradient momentum which are each set to 0.95. Cross referencing this with the source code it seems like these refer to the decay factor on the gradient and the squared gradient which are fixed to be the same in the centred version of the algorithm (see for example the pytorch version here: https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html). Nice catch in any case! |
Ah, I see what you mean! The "momentum" terms are referring to the first and second moment estimations in the denominator. I didn't check the original optimizer before opening the PR 😅 In light of this, I think we could do any of the following:
|
Sorry for the less than expedient reply. Deleting the unused variable makes sense to me! Feel free to amend the PR to do that and I can accept it! |
@kenjyoung PR updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@kenjyoung I don't have write access, would you be able to squash and merge for me? |
For sure, I thought I did already haha. |
I noticed that
GRAD_MOMENTUM
on the following line is never used:MinAtar/examples/dqn.py
Line 45 in 1918a2f
I think this is likely a bug, since the default momentum for Pytorch's RMSProp is 0 and not 0.95 (see docs).
This PR fixes it by passing the value into the optimizer constructor.