Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix bug in nag optimizer #13683

Merged
merged 6 commits into from
Jan 16, 2019
Merged

fix bug in nag optimizer #13683

merged 6 commits into from
Jan 16, 2019

Conversation

solin319
Copy link
Contributor

grad += wd * weight
mom[:] += grad
grad[:] += self.momentum * mom
weight[:] += -lr * grad

This will minus wd*weight twice, but in formula

state = momentum * state + grad + wd * weight 
weight = weight - (lr * (grad + momentum * state)) 

only minus once.

```
grad += wd * weight
mom[:] += grad
grad[:] += self.momentum * mom
weight[:] += -lr * grad
```
This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight   weight = weight - (lr * (grad + momentum * state)) ` only minus once.
@solin319 solin319 requested a review from szha as a code owner December 19, 2018 03:53
fix bug in nag test
@@ -974,8 +974,7 @@ def update(self, index, weight, grad, state):
if state is not None:
mom = state
mom[:] *= self.momentum
grad += wd * weight
mom[:] += grad
mom[:] += grad + wd * weight
grad[:] += self.momentum * mom
weight[:] += -lr * grad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this weight[:] -= lr * grad it is more clear this way

@@ -974,8 +974,7 @@ def update(self, index, weight, grad, state):
if state is not None:
mom = state
mom[:] *= self.momentum
grad += wd * weight
mom[:] += grad
mom[:] += grad + wd * weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you replace with mom[:] = (self.momentum * mom[:]) + grad + wd * weight
and delete line 976. It will be more readable

@Roshrini
Copy link
Member

@mxnet-label-bot Add [pr-awaiting-response]

@marcoabreu marcoabreu added the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Dec 19, 2018
@solin319
Copy link
Contributor Author

We have rewrited nag followed by anirudhacharya's suggests.

@solin319
Copy link
Contributor Author

@szha @Roshrini

@Roshrini
Copy link
Member

Roshrini commented Jan 2, 2019

@anirudhacharya Can you take a look again?
@sandeep-krishnamurthy For review/merge

@anirudhacharya
Copy link
Member

LGTM

@mxnet-label-bot update [pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-response PR is reviewed and waiting for contributor to respond labels Jan 14, 2019
mom[:] *= self.momentum
grad += wd * weight
mom[:] += grad
mom[:] = self.momentum * mom[:] + grad + wd * weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try doing all these with in-place operators.

mom[:] *= self.momentum
grad32 += wd * weight32
mom[:] += grad32
mom[:] = self.momentum * mom[:] + grad32 + wd * weight32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try doing all these with in-place operators.

@solin319
Copy link
Contributor Author

solin319 commented Jan 15, 2019

Which operator in in-place operator? @szha
mx.nd.add ?

@szha
Copy link
Member

szha commented Jan 15, 2019

Currently the rhs will result in allocating temporary space for the results of self.momentum * mom[:], self.momentum * mom[:] + grad32, and self.momentum * mom[:] + grad32 + wd * weight32. They were written in the previous way to avoid such memory spikes.

@solin319
Copy link
Contributor Author

solin319 commented Jan 15, 2019

Shall we change the code back to

mom[:] *= self.momentum
grad32 += wd * weight32
mom[:] += grad32

@stu1130
Copy link
Contributor

stu1130 commented Jan 15, 2019

@mxnet-label-bot update [pr-work-in-progress]

@marcoabreu marcoabreu added pr-work-in-progress PR is still work in progress and removed pr-awaiting-merge Review and CI is complete. Ready to Merge labels Jan 15, 2019
@solin319
Copy link
Contributor Author

Code has changed with in-place operators.

@szha szha merged commit 9314689 into apache:master Jan 16, 2019
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* fix bug in nag optimizer

```
grad += wd * weight
mom[:] += grad
grad[:] += self.momentum * mom
weight[:] += -lr * grad
```
This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight   weight = weight - (lr * (grad + momentum * state)) ` only minus once.

* fix bug in nag test

fix bug in nag test

* rewrite nag test

* rewrite nag

* fix nag with in-place operations

* fix nag with in-place operations
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* fix bug in nag optimizer

```
grad += wd * weight
mom[:] += grad
grad[:] += self.momentum * mom
weight[:] += -lr * grad
```
This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight   weight = weight - (lr * (grad + momentum * state)) ` only minus once.

* fix bug in nag test

fix bug in nag test

* rewrite nag test

* rewrite nag

* fix nag with in-place operations

* fix nag with in-place operations
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants