Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fixing broken links #16500

Merged
merged 11 commits into from
Oct 17, 2019
2 changes: 1 addition & 1 deletion docs/python_docs/python/tutorials/deploy/export/onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ In this tutorial, we will learn how to use MXNet to ONNX exporter on pre-trained
## Prerequisites

To run the tutorial you will need to have installed the following python modules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run the tutorial you will need to have installed the following python modules:
To run the tutorial, install the following Python modules:

- [MXNet >= 1.3.0](http://mxnet.apache.org/install/index.html)
- [MXNet >= 1.3.0]()
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved
- [onnx]( https://github.com/onnx/onnx#installation) v1.2.1 (follow the install guide)

*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set, which comes with ONNX v1.2.1.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The following tutorials will help you learn how to deploy MXNet on various AWS p

.. card::
:title: Training with Data from S3
:link: https://mxnet.apache.org/versions/master/faq/s3_integration.html
:link: s3_integration.html
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved

How to train with data from Amazon S3 buckets.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ from mxnet.gluon.data.vision import transforms
from mxnet.gluon.model_zoo.vision import resnet50_v2
```

Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](https://mxnet.apache.org/tutorials/gluon/learning_rate_schedules.html) to adjust learning rates during training.
Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](../packages/gluon/training/learning_rates/learning_rate_schedules.html) to adjust learning rates during training.
Here we set the `epochs` to 1 for quick demonstration, please change to 40 for actual training.

```python
Expand Down Expand Up @@ -324,4 +324,4 @@ You can also find more ways to run inference and deploy your models here:
2. [Gluon book on fine-tuning](https://www.d2l.ai/chapter_computer-vision/fine-tuning.html)
3. [Gluon CV transfer learning tutorial](https://gluon-cv.mxnet.io/build/examples_classification/transfer_learning_minc.html)
4. [Gluon crash course](https://gluon-crash-course.mxnet.io/)
5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Comparison Guides

.. card::
:title: Caffe to MXNet
:link: https://mxnet.apache.org/versions/master/faq/caffe.html
:link: /api/faq/caffe.html

How to convert Caffe models to MXNet and how to call Caffe operators from MXNet.

Expand Down
10 changes: 5 additions & 5 deletions docs/python_docs/python/tutorials/packages/autograd/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ Gradients are fundamental to the process of training neural networks, and tell u

Under the hood, neural networks are composed of operators (e.g. sums, products, convolutions, etc) some of which use parameters (e.g. the weights in convolution kernels) for their computation, and it's our job to find the optimal values for these parameters. Gradients lead us to the solution!

Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.L2Loss.html?highlight=l2#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.SoftmaxCrossEntropyLoss.html).
Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.SoftmaxCrossEntropyLoss).

Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](http://beta.mxnet.io/api/gluon-related/mxnet.optimizer.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.
Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](/api/python/docs/api/optimizer/index.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.

## How do we calculate gradients?

### Short Answer:

We differentiate. [MXNet Gluon](http://beta.mxnet.io/api/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.
We differentiate. [MXNet Gluon](/api/python/docs/tutorials/packages/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.

![forward-backward](/_static/autograd/autograd_forward_backward.png)

Expand Down Expand Up @@ -159,7 +159,7 @@ print('is_training:', is_training, output)

We called `dropout` while `autograd` was recording this time, so our network was in training mode and we see dropout of the input this time. Since the probability of dropout was 50%, the output is automatically scaled by 1/0.5=2 to preserve the average activation.

We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=dropout#mxnet.ndarray.Dropout) operator, but this usage is uncommon.
We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](/api/python/ndarray/ndarray.html?highlight=dropout#mxnet.ndarray.Dropout) operator, but this usage is uncommon.
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved

## Advanced: Skipping the calculation of parameter gradients

Expand Down Expand Up @@ -196,7 +196,7 @@ print(x.grad)

## Advanced: Using Python control flow

As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](https://mxnet.apache.org/versions/master/tutorials/control_flow/ControlFlowTutorial.html).
As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](/api/python/tutorials/extend/control_flow.html).
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved

We'll write a function as a toy example of a dynamic network. We'll add an `if` condition and a loop with a variable number of iterations, both of which will depend on the input data. Although these can now be used in static graphs (with conditional operators) it's still much more natural to use native control flow.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The only instance method needed to be implemented is [forward(self, x)](https://
In the example below, we define a new layer and implement `forward()` method to normalize input data by fitting it into a range of [0, 1].

```python
# Do some initial imports used throughout this tutorial
# Do some initial imports used throughout this tutorial
from __future__ import print_function
import mxnet as mx
from mxnet import nd, gluon, autograd
Expand All @@ -53,7 +53,7 @@ The rest of methods of the `Block` class are already implemented, and majority o

Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.

The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/versions/master/architecture/program_model.html).
The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from this [deep learning programming paradigm](/api/architecture/program_model) article.

Hybridization is a process that Apache MxNet uses to create a symbolic graph of a forward computation. This allows to increase computation performance by optimizing the computational symbolic graph. Once the symbolic graph is created, Apache MxNet caches and reuses it for subsequent computations.

Expand Down Expand Up @@ -143,7 +143,7 @@ class NormalizationHybridLayer(gluon.HybridBlock):
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy()),
differentiable=False)

def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
Expand Down Expand Up @@ -175,14 +175,14 @@ def print_params(title, net):
"""
print(title)
hybridlayer_params = {k: v for k, v in net.collect_params().items() if 'normalizationhybridlayer' in k }

for key, value in hybridlayer_params.items():
print('{} = {}\n'.format(key, value.data()))

net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons

Expand All @@ -195,15 +195,15 @@ label = nd.random_uniform(low=-1, high=1, shape=(5, 1))

mse_loss = gluon.loss.L2Loss() # Mean squared error between output and label
trainer = gluon.Trainer(net.collect_params(), # Init trainer with Stochastic Gradient Descent (sgd) optimization method and parameters for it
'sgd',
'sgd',
{'learning_rate': 0.1, 'momentum': 0.9 })
with autograd.record(): # Autograd records computations done on NDArrays inside "with" block

with autograd.record(): # Autograd records computations done on NDArrays inside "with" block
output = net(input) # Run forward propogation
print_params("=========== Parameters after forward pass ===========\n", net)

print_params("=========== Parameters after forward pass ===========\n", net)
loss = mse_loss(output, label) # Calculate MSE

loss.backward() # Backward computes gradients and stores them as a separate array within each NDArray in .grad field
trainer.step(input.shape[0]) # Trainer updates parameters of every block, using .grad field using oprimization method (sgd in this example)
# We provide batch size that is used as a divider in cost function formula
Expand All @@ -213,29 +213,29 @@ print_params("=========== Parameters after backward pass ===========\n", net)
```python
=========== Parameters after forward pass ===========

hybridsequential94_normalizationhybridlayer0_weights =
hybridsequential94_normalizationhybridlayer0_weights =
[[-0.3983642 -0.505708 -0.02425683 -0.3133553 -0.35161012]
[ 0.6467543 0.3918715 -0.6154656 -0.20702496 -0.4243446 ]
[ 0.6077331 0.03922009 0.13425875 0.5729856 -0.14446527]
[-0.3572498 0.18545026 -0.09098256 0.5106366 -0.35151464]
[-0.39846328 0.22245121 0.13075739 0.33387476 -0.10088372]]
<NDArray 5x5 @cpu(0)>

hybridsequential94_normalizationhybridlayer0_scales =
hybridsequential94_normalizationhybridlayer0_scales =
[2.]
<NDArray 1 @cpu(0)>

=========== Parameters after backward pass ===========

hybridsequential94_normalizationhybridlayer0_weights =
hybridsequential94_normalizationhybridlayer0_weights =
[[-0.29839832 -0.47213346 0.08348035 -0.2324698 -0.27368504]
[ 0.76268613 0.43080837 -0.49052125 -0.11322092 -0.3339738 ]
[ 0.48665082 -0.00144657 0.00376363 0.47501418 -0.23885089]
[-0.22626656 0.22944227 0.05018325 0.6166192 -0.24941102]
[-0.44946212 0.20532274 0.07579394 0.29261002 -0.14063817]]
<NDArray 5x5 @cpu(0)>

hybridsequential94_normalizationhybridlayer0_scales =
hybridsequential94_normalizationhybridlayer0_scales =
[2.]
<NDArray 1 @cpu(0)>
```
Expand Down
Loading