Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Aggregated adamw update #16398

Merged
merged 15 commits into from
Oct 19, 2019
Merged

Conversation

drivanov
Copy link
Contributor

@drivanov drivanov commented Oct 8, 2019

Description

MxNet operator for aggregated Adam update

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Code is well-documented:
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • New operator allows to make Adam update for multiple gradients in one kernel.
    tests, (and when applicable, API doc)
  • Test, for previously used "single" Adam update modified and now it also includes the use of clip_gradient parameter and random variations for lr, eta, wd and shape.

@drivanov drivanov requested a review from szha as a code owner October 8, 2019 21:21
@ptrendx
Copy link
Member

ptrendx commented Oct 8, 2019

@eric-haibin-lin FYI

Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @eric-haibin-lin do you have any more comments?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment below is not specific to this PR:
It looks like there are lots of code changes to register operators for multi_xx_update ops. What do you think would be helpful to reduce the development cycle for such ops? As we add more ops like this, the c++ code becomes less readable. Maybe TVM can help generate multi-tensor kernels?

@ptrendx
Copy link
Member

ptrendx commented Oct 16, 2019

Generally I would opt for cleaning the optimizers so that only the multi_ versions of the optimizers code exist (so use the same code for both multi and single tensor versions). The next step would be to refactor common code for handling multiple tensors out.

@eric-haibin-lin eric-haibin-lin merged commit ffec31f into apache:master Oct 19, 2019
@eric-haibin-lin
Copy link
Member

eric-haibin-lin commented Oct 19, 2019

Thank you @drivanov @ptrendx

@drivanov drivanov deleted the aggregated_adamw_update branch October 21, 2019 14:53
apeforest pushed a commit that referenced this pull request Nov 6, 2019
* Trigger CI

* MxNet operator for aggregated Adam update

* Fixing problem with getRescaleGrad(...) call in Python2
and some minor changes requested by Przemek

* Fix a problem appearing in Python2

* Minor cleanup

* Changing function name

* Trigger CI

* Eliminating "asnumpy()" conversion

* Trigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants