Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. #15768

kshitij12345 · 2019-08-06T17:16:33Z

Description

Rescaling the gradient used to update the data of grad passed as input.
Detailed in #15759

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Fix the bug.
Relevant test.

src/operator/optimizer_op.cu

kshitij12345 · 2019-08-10T12:28:15Z

@sxjscience @eric-haibin-lin @apeforest @larroy Could you please have a look.

All the original tests pass and have added test to check only expected variables are mutated.

apeforest

LGTM. Excellent job!

sxjscience · 2019-08-29T17:50:02Z

src/operator/optimizer_op-inl.h

+    using namespace mshadow_op;
+
+    const DType rescaled_grad = rescale_grad * grad_data[i] +
+           wd * weight_data[i];


I find that we can actually simplify the code by adding the if-else statement here.

if(clip_gradient >= 0.0f) { rescaled_grad = clip::Map(rescaled_grad, clip_gradient); }

Change all appearance of the pattern and it should be good to merge.

@sxjscience Done.

Thanks for the suggestions. Makes it more readable.

* refactor code.

apache#15768) * update code to fix apache#15759 * add relevant test * re-add the removed conditional dispatch * fix grad mutate for ftrl_update * add test for ftrl_update * fix grad mutate for rmspropalex_update * add test for rmspropalex_update * use KERNEL_ASSIGN in RMSPropAlexUpdateKernel. * fix grad mutate for rmsprop_update * add test for rmsprop_update * add more optimizers for mutation test * retrigger CI * retrigger CI * retrigger CI * retrigger CI * address comments. * refactor code. * retrigger CI * retrigger CI * retrigger CI

Vikas-kum · 2019-09-17T03:48:23Z

@kshitij12345 @apeforest Test is failing for this change -
export MXNET_TEST_SEED=412298777
nosetests -v tests/python/unittest/test_ndarray.py:test_update_ops_mutation

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/master/1039/pipeline

kshitij12345 · 2019-09-18T14:11:48Z

@Vikas-kum Thanks.

I have checked and found that the difference is -5.9604645e-08 which is lower than the default rtol of 1e-07 for assert_mutate.
It works as expected with more sensitive tolerance.
Will send a PR with more sensitive tolerance.

ChaiBapchya · 2020-01-27T03:23:59Z

So I tested *_update ops and it turns out passing randn (which samples from uniform distribution) to *_update op (e.g. adam_update) gives output that may consist nans

> mx.nd.adam_update(mx.nd.random.randn(10),mx.nd.random.randn(10),mx.nd.random.randn(10),mx.nd.random.randn(10),.01)

[        nan         nan  0.09059371         nan -0.1433481          nan
 -0.5170411   0.381852    0.62985003         nan]

Now the way you've tested is checked if input and output has mutation after *_update method is called. Does that take into consideration the NaNs?

@kshitij12345

kshitij12345 · 2020-01-27T17:41:35Z

@ChaiBapchya
Thanks for checking. Interesting find. Haven't considered this case and thus there is no code for handling this condition.
Will look into this.

ChaiBapchya · 2020-01-28T02:46:44Z

Sure. I think it needs to be handled.

kshitij12345 added 2 commits August 6, 2019 22:39

update code to fix apache#15759

44e157e

add relevant test

5d11982

sxjscience reviewed Aug 6, 2019

View reviewed changes

src/operator/optimizer_op.cu Outdated Show resolved Hide resolved

kshitij12345 added 3 commits August 6, 2019 23:53

re-add the removed conditional dispatch

dc7e401

fix grad mutate for ftrl_update

1be1f01

add test for ftrl_update

281e3b2

kshitij12345 changed the title ~~Fix gradient tensor mutate in adam_update.~~ Fix gradient tensor mutate in {adam/ftrl}_update. Aug 8, 2019

kshitij12345 added 6 commits August 9, 2019 09:06

fix grad mutate for rmspropalex_update

c135c9d

add test for rmspropalex_update

f92a5f6

use KERNEL_ASSIGN in RMSPropAlexUpdateKernel.

267201b

fix grad mutate for rmsprop_update

a80564b

add test for rmsprop_update

9de7d21

add more optimizers for mutation test

05844ac

kshitij12345 changed the title ~~Fix gradient tensor mutate in {adam/ftrl}_update.~~ Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. Aug 10, 2019

roywei added the Operator label Aug 19, 2019

kshitij12345 added 2 commits August 20, 2019 23:10

retrigger CI

6ef7408

retrigger CI

30dffdb

sxjscience approved these changes Aug 23, 2019

View reviewed changes

apeforest approved these changes Aug 26, 2019

View reviewed changes

kshitij12345 added 2 commits August 27, 2019 08:11

retrigger CI

91227bf

retrigger CI

932e7c4

sxjscience reviewed Aug 29, 2019

View reviewed changes

kshitij12345 added 4 commits August 30, 2019 22:51

address comments.

504839f

* refactor code.

retrigger CI

8ee84cc

retrigger CI

08ff57f

retrigger CI

7fc9d0c

apeforest merged commit d60be31 into apache:master Sep 5, 2019

kshitij12345 deleted the fix/optimizer/adam/grad_mutate branch September 7, 2019 09:44

kshitij12345 mentioned this pull request Sep 18, 2019

[fix] Update test_update_ops_mutation tolerance #16198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. #15768

Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. #15768

kshitij12345 commented Aug 6, 2019 •

edited

Loading

kshitij12345 commented Aug 10, 2019

apeforest left a comment

sxjscience Aug 29, 2019

sxjscience Aug 29, 2019

kshitij12345 Aug 30, 2019 •

edited

Loading

Vikas-kum commented Sep 17, 2019

kshitij12345 commented Sep 18, 2019

ChaiBapchya commented Jan 27, 2020

kshitij12345 commented Jan 27, 2020

ChaiBapchya commented Jan 28, 2020

Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. #15768

Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. #15768

Conversation

kshitij12345 commented Aug 6, 2019 • edited Loading

Description

Checklist

Essentials

Changes

kshitij12345 commented Aug 10, 2019

apeforest left a comment

Choose a reason for hiding this comment

sxjscience Aug 29, 2019

Choose a reason for hiding this comment

sxjscience Aug 29, 2019

Choose a reason for hiding this comment

kshitij12345 Aug 30, 2019 • edited Loading

Choose a reason for hiding this comment

Vikas-kum commented Sep 17, 2019

kshitij12345 commented Sep 18, 2019

ChaiBapchya commented Jan 27, 2020

kshitij12345 commented Jan 27, 2020

ChaiBapchya commented Jan 28, 2020

Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. #15768

Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. #15768

kshitij12345 commented Aug 6, 2019 •

edited

Loading

kshitij12345 Aug 30, 2019 •

edited

Loading