-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
``` grad += wd * weight mom[:] += grad grad[:] += self.momentum * mom weight[:] += -lr * grad ``` This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight weight = weight - (lr * (grad + momentum * state)) ` only minus once.
fix bug in nag test
python/mxnet/optimizer/optimizer.py
Outdated
@@ -974,8 +974,7 @@ def update(self, index, weight, grad, state): | |||
if state is not None: | |||
mom = state | |||
mom[:] *= self.momentum | |||
grad += wd * weight | |||
mom[:] += grad | |||
mom[:] += grad + wd * weight | |||
grad[:] += self.momentum * mom | |||
weight[:] += -lr * grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this weight[:] -= lr * grad
it is more clear this way
python/mxnet/optimizer/optimizer.py
Outdated
@@ -974,8 +974,7 @@ def update(self, index, weight, grad, state): | |||
if state is not None: | |||
mom = state | |||
mom[:] *= self.momentum | |||
grad += wd * weight | |||
mom[:] += grad | |||
mom[:] += grad + wd * weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replace with mom[:] = (self.momentum * mom[:]) + grad + wd * weight
and delete line 976. It will be more readable
@mxnet-label-bot Add [pr-awaiting-response] |
We have rewrited nag followed by anirudhacharya's suggests. |
@anirudhacharya Can you take a look again? |
LGTM @mxnet-label-bot update [pr-awaiting-merge] |
mom[:] *= self.momentum | ||
grad += wd * weight | ||
mom[:] += grad | ||
mom[:] = self.momentum * mom[:] + grad + wd * weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try doing all these with in-place operators.
mom[:] *= self.momentum | ||
grad32 += wd * weight32 | ||
mom[:] += grad32 | ||
mom[:] = self.momentum * mom[:] + grad32 + wd * weight32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try doing all these with in-place operators.
Which operator in in-place operator? @szha |
Currently the rhs will result in allocating temporary space for the results of |
Shall we change the code back to
|
@mxnet-label-bot update [pr-work-in-progress] |
Code has changed with in-place operators. |
* fix bug in nag optimizer ``` grad += wd * weight mom[:] += grad grad[:] += self.momentum * mom weight[:] += -lr * grad ``` This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight weight = weight - (lr * (grad + momentum * state)) ` only minus once. * fix bug in nag test fix bug in nag test * rewrite nag test * rewrite nag * fix nag with in-place operations * fix nag with in-place operations
* fix bug in nag optimizer ``` grad += wd * weight mom[:] += grad grad[:] += self.momentum * mom weight[:] += -lr * grad ``` This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight weight = weight - (lr * (grad + momentum * state)) ` only minus once. * fix bug in nag test fix bug in nag test * rewrite nag test * rewrite nag * fix nag with in-place operations * fix nag with in-place operations
This will minus wd*weight twice, but in formula
only minus once.