This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix gradient tensor mutate in
{adam/ftrl/rmprop/rmspropalex}_update
. (
#15768) * update code to fix #15759 * add relevant test * re-add the removed conditional dispatch * fix grad mutate for ftrl_update * add test for ftrl_update * fix grad mutate for rmspropalex_update * add test for rmspropalex_update * use KERNEL_ASSIGN in RMSPropAlexUpdateKernel. * fix grad mutate for rmsprop_update * add test for rmsprop_update * add more optimizers for mutation test * retrigger CI * retrigger CI * retrigger CI * retrigger CI * address comments. * refactor code. * retrigger CI * retrigger CI * retrigger CI
- Loading branch information
1 parent
d0fa8c0
commit d60be31
Showing
2 changed files
with
216 additions
and
130 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
d60be31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tested *_update ops and it turns out passing randn (which samples from uniform distribution) to *_update op (e.g. adam_update) gives output that may consist nans
Now the way you've tested is checked if input and output has mutation after *_update method is called. Does that take into consideration the NaNs?
@kshitij12345