-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix lars optitmizer bug #40892
fix lars optitmizer bug #40892
Conversation
✅ This PR's description meets the template requirements! |
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
807adc3
PR types
Bug fixes
PR changes
OPs
Describe
Fix lars optimizer lars_weight_decay=0 bug:
When lars_weight_decay=0, we got local_learning_rate = learning_rate * lars_coeff * ||param|| / ||gradient||, but the formula in lars_momentum_op.cu when lars_weight_decay =0 equals to local_learning_rate = learning_rate.