Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. #15768

Merged
merged 19 commits into from
Sep 5, 2019

Conversation

kshitij12345
Copy link
Contributor

@kshitij12345 kshitij12345 commented Aug 6, 2019

Description

Rescaling the gradient used to update the data of grad passed as input.
Detailed in #15759

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Fix the bug.
  • Relevant test.

@kshitij12345 kshitij12345 changed the title Fix gradient tensor mutate in adam_update. Fix gradient tensor mutate in {adam/ftrl}_update. Aug 8, 2019
@kshitij12345
Copy link
Contributor Author

@sxjscience @eric-haibin-lin @apeforest @larroy Could you please have a look.

All the original tests pass and have added test to check only expected variables are mutated.

@kshitij12345 kshitij12345 changed the title Fix gradient tensor mutate in {adam/ftrl}_update. Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. Aug 10, 2019
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Excellent job!

using namespace mshadow_op;

const DType rescaled_grad = rescale_grad * grad_data[i] +
wd * weight_data[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that we can actually simplify the code by adding the if-else statement here.

if(clip_gradient >= 0.0f) {
   rescaled_grad = clip::Map(rescaled_grad, clip_gradient);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all appearance of the pattern and it should be good to merge.

Copy link
Contributor Author

@kshitij12345 kshitij12345 Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sxjscience Done.

Thanks for the suggestions. Makes it more readable.

@apeforest apeforest merged commit d60be31 into apache:master Sep 5, 2019
@kshitij12345 kshitij12345 deleted the fix/optimizer/adam/grad_mutate branch September 7, 2019 09:44
gyshi pushed a commit to gyshi/incubator-mxnet that referenced this pull request Sep 7, 2019
apache#15768)

* update code to fix apache#15759

* add relevant test

* re-add the removed conditional dispatch

* fix grad mutate for ftrl_update

* add test for ftrl_update

* fix grad mutate for rmspropalex_update

* add test for rmspropalex_update

* use KERNEL_ASSIGN in RMSPropAlexUpdateKernel.

* fix grad mutate for rmsprop_update

* add test for rmsprop_update

* add more optimizers for mutation test

* retrigger CI

* retrigger CI

* retrigger CI

* retrigger CI

* address comments.

* refactor code.

* retrigger CI

* retrigger CI

* retrigger CI
gyshi pushed a commit to gyshi/incubator-mxnet that referenced this pull request Sep 7, 2019
apache#15768)

* update code to fix apache#15759

* add relevant test

* re-add the removed conditional dispatch

* fix grad mutate for ftrl_update

* add test for ftrl_update

* fix grad mutate for rmspropalex_update

* add test for rmspropalex_update

* use KERNEL_ASSIGN in RMSPropAlexUpdateKernel.

* fix grad mutate for rmsprop_update

* add test for rmsprop_update

* add more optimizers for mutation test

* retrigger CI

* retrigger CI

* retrigger CI

* retrigger CI

* address comments.

* refactor code.

* retrigger CI

* retrigger CI

* retrigger CI
@Vikas-kum
Copy link
Contributor

@kshitij12345 @apeforest Test is failing for this change -
export MXNET_TEST_SEED=412298777
nosetests -v tests/python/unittest/test_ndarray.py:test_update_ops_mutation

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/master/1039/pipeline

@kshitij12345
Copy link
Contributor Author

@Vikas-kum Thanks.

I have checked and found that the difference is -5.9604645e-08 which is lower than the default rtol of 1e-07 for assert_mutate.
It works as expected with more sensitive tolerance.
Will send a PR with more sensitive tolerance.

@ChaiBapchya
Copy link
Contributor

So I tested *_update ops and it turns out passing randn (which samples from uniform distribution) to *_update op (e.g. adam_update) gives output that may consist nans

> mx.nd.adam_update(mx.nd.random.randn(10),mx.nd.random.randn(10),mx.nd.random.randn(10),mx.nd.random.randn(10),.01)

[        nan         nan  0.09059371         nan -0.1433481          nan
 -0.5170411   0.381852    0.62985003         nan]

Now the way you've tested is checked if input and output has mutation after *_update method is called. Does that take into consideration the NaNs?

@kshitij12345

@kshitij12345
Copy link
Contributor Author

@ChaiBapchya
Thanks for checking. Interesting find. Haven't considered this case and thus there is no code for handling this condition.
Will look into this.

@ChaiBapchya
Copy link
Contributor

Sure. I think it needs to be handled.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants