Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
Update Adam optimizer documentation (#13754)
Browse files Browse the repository at this point in the history
  • Loading branch information
eric-haibin-lin committed Jan 4, 2019
1 parent e9a7aa4 commit 6a4bac6
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions python/mxnet/optimizer/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1030,26 +1030,28 @@ class Adam(Optimizer):
Stochastic Optimization*, available at http://arxiv.org/abs/1412.6980.
If the storage types of grad is ``row_sparse``, and ``lazy_update`` is True, \
**lazy updates** are applied by::
**lazy updates** at step t are applied by::
for row in grad.indices:
rescaled_grad[row] = clip(grad[row] * rescale_grad + wd * weight[row], clip_gradient)
m[row] = beta1 * m[row] + (1 - beta1) * rescaled_grad[row]
v[row] = beta2 * v[row] + (1 - beta2) * (rescaled_grad[row]**2)
w[row] = w[row] - learning_rate * m[row] / (sqrt(v[row]) + epsilon)
lr = learning_rate * sqrt(1 - beta1**t) / (1 - beta2**t)
w[row] = w[row] - lr * m[row] / (sqrt(v[row]) + epsilon)
The lazy update only updates the mean and var for the weights whose row_sparse
gradient indices appear in the current batch, rather than updating it for all indices.
Compared with the original update, it can provide large improvements in model training
throughput for some applications. However, it provides slightly different semantics than
the original update, and may lead to different empirical results.
Otherwise, **standard updates** are applied by::
Otherwise, **standard updates** at step t are applied by::
rescaled_grad = clip(grad * rescale_grad + wd * weight, clip_gradient)
m = beta1 * m + (1 - beta1) * rescaled_grad
v = beta2 * v + (1 - beta2) * (rescaled_grad**2)
w = w - learning_rate * m / (sqrt(v) + epsilon)
lr = learning_rate * sqrt(1 - beta1**t) / (1 - beta2**t)
w = w - lr * m / (sqrt(v) + epsilon)
This optimizer accepts the following parameters in addition to those accepted
by :class:`.Optimizer`.
Expand Down

0 comments on commit 6a4bac6

Please sign in to comment.