Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[DOC] refine autograd docs #15109

Merged
merged 6 commits into from
Aug 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 69 additions & 6 deletions docs/api/python/autograd/autograd.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,28 @@ to allocate space for the gradient. Then, start a `with autograd.record()` block
and do some computation. Finally, call `backward()` on the result:

```python
>>> x = mx.nd.array([1,2,3,4])
>>> x.attach_grad()
>>> with mx.autograd.record():
... y = x * x + 1
>>> y.backward()
>>> print(x.grad)
import mxnet as mx
x = mx.nd.array([1,2,3,4])
x.attach_grad()
with mx.autograd.record():
y = x * x + 1
y.backward()
print(x.grad)
```

Which outputs:

```
[ 2. 4. 6. 8.]
<NDArray 4 @cpu(0)>
```

Gradient recording is enabled during the scope of the `with mx.autograd.record():` statement, then
disabled when we go out of that scope.

It can be also set manually by executing `mx.autograd.set_recording(True)`, and turning it off after
we no longer want to record operations with `mx.autograd.set_recording(False)`.


## Train mode and Predict Mode

Expand All @@ -76,8 +88,59 @@ Detailed tutorials are available in Part 1 of
[the MXNet gluon book](http://gluon.mxnet.io/).


# Higher order gradient

Some operators support higher order gradients. Some operators support differentiating multiple
times, and others two, most just once.

For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording
larroy marked this conversation as resolved.
Show resolved Hide resolved
and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that
the first call uses `create_graph=True` and `retain_graph=True` and the second call uses
`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If
larroy marked this conversation as resolved.
Show resolved Hide resolved
we would be to recreate the graph in the second call, we would end up with a graph of just the
backward nodes, not the full initial graph that includes the forward nodes.

The pattern to calculate higher order gradients is the following:

```python
from mxnet import ndarray as nd
from mxnet import autograd as ag
x = nd.array([1,2,3])
x.attach_grad()
larroy marked this conversation as resolved.
Show resolved Hide resolved
def f(x):
# Any function which supports higher oder gradient
return nd.log(x)
```

If the operators used in `f` don't support higher order gradients you will get an error like
`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means
that it doesn't support getting the gradient of the gradient. Which is, running backward on
the backward graph.

Using mxnet.autograd.grad multiple times:

```python
with ag.record():
larroy marked this conversation as resolved.
Show resolved Hide resolved
y = f(x)
x_grad = ag.grad(heads=y, variables=x, create_graph=True, retain_graph=True)[0]
x_grad_grad = ag.grad(heads=x_grad, variables=x, create_graph=False, retain_graph=False)[0]
```

Running backward on the backward graph:

```python
with ag.record():
y = f(x)
x_grad = ag.grad(heads=y, variables=x, create_graph=True, retain_graph=True)[0]
x_grad.backward()
x_grad_grad = x.grad
```

Both methods are equivalent, except that in the second case, retain_graph on running backward is set
larroy marked this conversation as resolved.
Show resolved Hide resolved
to False by default. But both calls are running a backward pass as on the graph as usual to get the
gradient of the first gradient `x_grad` with respect to `x` evaluated at the value of `x`.

For more examples, check the [higher order gradient unit tests](https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_higher_order_grad.py).


<script type="text/javascript" src='../../../_static/js/auto_module_index.js'></script>
Expand Down
3 changes: 3 additions & 0 deletions python/mxnet/autograd.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,9 @@ def predict_mode():
def mark_variables(variables, gradients, grad_reqs='write'):
"""Mark NDArrays as variables to compute gradient for autograd.

This is equivalent to the function .attach_grad() in a variable, but with this
call we can set the gradient to any value.

Parameters
----------
variables: NDArray or list of NDArray
Expand Down
2 changes: 2 additions & 0 deletions python/mxnet/ndarray/ndarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2147,6 +2147,8 @@ def attach_grad(self, grad_req='write', stype=None):
"""Attach a gradient buffer to this NDArray, so that `backward`
can compute gradient with respect to it.

The gradient is initialized to zeros.

Parameters
----------
grad_req : {'write', 'add', 'null'}
Expand Down