-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add amsgrad optimizer #382
Add amsgrad optimizer #382
Conversation
Hi @mkunesch, any chance you could review this? Thanks. |
@merajhashemi thanks for the PR! |
655720b
to
d5d7260
Compare
@mtthss Done! Could you run the CI again? |
optax/_src/alias.py
Outdated
@@ -189,6 +189,7 @@ def adam( | |||
b2: float = 0.999, | |||
eps: float = 1e-8, | |||
eps_root: float = 0.0, | |||
amsgrad: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's introduce a separate alias amsgrad
for this rather than adding a flag to Adam. My thinking is that there will be other improvements/modifications to Adam in the future and we should avoid an accumulation of options in the simple adam setup. Furthermore, I think it would make it easier for user to find amsgrad
in optax and more obvious in the code that amsgrad is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry, I pressed send to early)
Thank you so much for this contribution! This is excellent!
I've just added very minor comments.
Could you add a test for this PR? The easiest thing might be to just add amsgrad
to the optimizer list in alias_test.py
so that it gets tested on a parabola. It might also be nice to test the nu_max
behavior explicitly in transform_test.py
but the parabola test is the more important one.
Also, just to say that we will only be able to merge this after the ICLR deadline, but we can approve it before then so that we can merge immediately on the 29th of September.
Thanks a lot again for this excellent contribution!
Hi @merajhashemi, thanks a lot for your contribution! Any change you could address the comments made by @mkunesch, so we could merge this PR in the next version of Optax? Thanks! |
Actually, the changes are quite minimal (introducing a separate alias + adding it to the test). I'd be happy to approve and make these changes upon merging if that's okay with you @merajhashemi? |
Hi! Ah, thanks a lot for that context - that makes a lot of sense. I'm probably still leaning towards splitting as we have generally tried to avoid boolean flags in optax for a while now and don't mirror pytorch and tensorflow in other optimizers too. @hbq1 : was your 👍 a vote for leaving it as an argument or for splitting it? Thanks a lot! |
I like the idea of splitting it into a separate optimiser for clarity 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thanks a lot for splitting the optimizer from adam.
(I'll make some very minor formatting edits and fix the conflicts with master as I merge the PR)
Thanks a lot for the contribution again!
(sorry for the 3 commits to your branch - I had to merge master before importing the PR and I messed up the merge in the github editor) |
Hi,
This pr implements Amsgrad—an extension to Adam that improves its convergence properties.