Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fixing broken links #16500

Merged
merged 11 commits into from
Oct 17, 2019
4 changes: 2 additions & 2 deletions docs/python_docs/python/tutorials/deploy/export/onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ In this tutorial, we will learn how to use MXNet to ONNX exporter on pre-trained
## Prerequisites

To run the tutorial you will need to have installed the following python modules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run the tutorial you will need to have installed the following python modules:
To run the tutorial, install the following Python modules:

- [MXNet >= 1.3.0](http://mxnet.apache.org/install/index.html)
- [MXNet >= 1.3.0](/get_started)
- [onnx]( https://github.com/onnx/onnx#installation) v1.2.1 (follow the install guide)

*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set, which comes with ONNX v1.2.1.

Expand Down Expand Up @@ -147,4 +147,4 @@ checker.check_graph(model_proto.graph)

If the converted protobuf format doesn't qualify to ONNX proto specifications, the checker will throw errors, but in this case it successfully passes.

This method confirms exported model protobuf is valid. Now, the model is ready to be imported in other frameworks for inference!
This method confirms exported model protobuf is valid. Now, the model is ready to be imported in other frameworks for inference!
82 changes: 5 additions & 77 deletions docs/python_docs/python/tutorials/deploy/run-on-aws/cloud.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,80 +26,8 @@ learning models. Using AWS, we can rapidly fire up multiple machines
with multiple GPUs each at will and maintain the resources for precisely
the amount of time needed.

Set Up an AWS GPU Cluster from Scratch
--------------------------------------

In this document, we provide a step-by-step guide that will teach you
how to set up an AWS cluster with *MXNet*. We show how to:

- ``Use Amazon S3 to host data``\ \_
- ``Set up an EC2 GPU instance with all dependencies installed``\ \_
- ``Build and run MXNet on a single computer``\ \_
- ``Set up an EC2 GPU cluster for distributed training``\ \_

Use Amazon S3 to Host Data
:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~`````````\ ~`\ ~~

Amazon S3 provides distributed data storage which proves especially
convenient for hosting large datasets. To use S3, you need
``AWS credentials``\ \_, including an ``ACCESS_KEY_ID`` and a
``SECRET_ACCESS_KEY``.

To use *MXNet* with S3, set the environment variables
``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` by adding the
following two lines in ``~/.bashrc`` (replacing the strings with the
correct ones):

.. code:: bash

export AWS\_ACCESS\_KEY\_ID=AKIAIOSFODNN7EXAMPLE export
AWS\_SECRET\_ACCESS\_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

There are several ways to upload data to S3. One simple way is to use
``s3cmd``\ \_. For example:

.. code:: bash

wget http://data.mxnet.io/mxnet/data/mnist.zip unzip mnist.zip && s3cmd
put t\*-ubyte s3://dmlc/mnist/

Use Pre-installed EC2 GPU Instance
:sub:`:sub:`~`\ :sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~`````````````\ ~~

The ``Deep Learning AMI``\ \_ is an Amazon Linux image supported and
maintained by Amazon Web Services for use on Amazon Elastic Compute
Cloud (Amazon EC2). It contains ``MXNet-v0.9.3 tag``\ \_ and the
necessary components to get going with deep learning, including Nvidia
drivers, CUDA, cuDNN, Anaconda, Python2 and Python3. The AMI IDs are the
following:

- us-east-1: ami-e7c96af1
- us-west-2: ami-dfb13ebf
- eu-west-1: ami-6e5d6808

Now you can launch *MXNet* directly on an EC2 GPU instance. You can also
use ``Jupyter``\ \_ notebook on EC2 machine. Here is a
``good tutorial``\ \_ on how to connect to a Jupyter notebook running on
an EC2 instance.

Set Up an EC2 GPU Instance from Scratch
:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~``````\ :sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~```````\ :sub:`:sub:`~```

*MXNet* requires the following libraries:

- C++ compiler with C++11 support, such as ``gcc >= 4.8``
- ``CUDA`` (``CUDNN`` in optional) for GPU linear algebra
- ``BLAS`` (cblas, open-blas, atblas, mkl, or others)

.. \_Use Amazon S3 to host data: #use-amazon-s3-to-host-data .. \_Set up
an EC2 GPU instance with all dependencies installed:
#set-up-an-ec2-gpu-instance .. \_Build and run MXNet on a single
computer: #build-and-run-mxnet-on-a-gpu-instance .. \_Set up an EC2 GPU
cluster for distributed training:
#set-up-an-ec2-gpu-cluster-for-distributed-training .. \_AWS
credentials:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html
.. \_s3cmd: http://s3tools.org/s3cmd .. *Deep Learning AMI:
https://aws.amazon.com/marketplace/pp/B01M0AXXQB?qid=1475211685369&sr=0-1&ref*\ =srh\_res\_product\_title
.. \_MXNet-v0.9.3 tag: https://github.com/apache/incubator-mxnet .. \_Jupyter:
http://jupyter.org
Here are some ways you can use MXNet on AWS:
1. Use [Amazon SageMaker](https://aws.amazon.com/sagemaker/developer-resources/)
1. Use the [AWS Deep Learning AMI with Conda](https://docs.aws.amazon.com/dlami/latest/devguide/overview-conda.html) (comes preinstalled!)
1. Use an [AWS Deep Learning Container](https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers.html)
1. Install MXNet on a [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Install MXNet on a [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
1. Install MXNet on an [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)

Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The following tutorials will help you learn how to deploy MXNet on various AWS p

.. card::
:title: Training with Data from S3
:link: https://mxnet.apache.org/versions/master/faq/s3_integration.html
:link: /api/faq/s3_integration

How to train with data from Amazon S3 buckets.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ from mxnet.gluon.data.vision import transforms
from mxnet.gluon.model_zoo.vision import resnet50_v2
```

Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](https://mxnet.apache.org/tutorials/gluon/learning_rate_schedules.html) to adjust learning rates during training.
Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](../packages/gluon/training/learning_rates/learning_rate_schedules.html) to adjust learning rates during training.
Here we set the `epochs` to 1 for quick demonstration, please change to 40 for actual training.

```python
Expand Down Expand Up @@ -324,4 +324,4 @@ You can also find more ways to run inference and deploy your models here:
2. [Gluon book on fine-tuning](https://www.d2l.ai/chapter_computer-vision/fine-tuning.html)
3. [Gluon CV transfer learning tutorial](https://gluon-cv.mxnet.io/build/examples_classification/transfer_learning_minc.html)
4. [Gluon crash course](https://gluon-crash-course.mxnet.io/)
5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Comparison Guides

.. card::
:title: Caffe to MXNet
:link: https://mxnet.apache.org/versions/master/faq/caffe.html
:link: /api/faq/caffe.html

How to convert Caffe models to MXNet and how to call Caffe operators from MXNet.

Expand Down
10 changes: 5 additions & 5 deletions docs/python_docs/python/tutorials/packages/autograd/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ Gradients are fundamental to the process of training neural networks, and tell u

Under the hood, neural networks are composed of operators (e.g. sums, products, convolutions, etc) some of which use parameters (e.g. the weights in convolution kernels) for their computation, and it's our job to find the optimal values for these parameters. Gradients lead us to the solution!

Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.L2Loss.html?highlight=l2#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.SoftmaxCrossEntropyLoss.html).
Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.SoftmaxCrossEntropyLoss).

Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](http://beta.mxnet.io/api/gluon-related/mxnet.optimizer.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.
Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](/api/python/docs/api/optimizer/index.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.

## How do we calculate gradients?

### Short Answer:

We differentiate. [MXNet Gluon](http://beta.mxnet.io/api/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.
We differentiate. [MXNet Gluon](/api/python/docs/tutorials/packages/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.

![forward-backward](/_static/autograd/autograd_forward_backward.png)

Expand Down Expand Up @@ -159,7 +159,7 @@ print('is_training:', is_training, output)

We called `dropout` while `autograd` was recording this time, so our network was in training mode and we see dropout of the input this time. Since the probability of dropout was 50%, the output is automatically scaled by 1/0.5=2 to preserve the average activation.

We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=dropout#mxnet.ndarray.Dropout) operator, but this usage is uncommon.
We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](/api/python/ndarray/ndarray.html#mxnet.ndarray.Dropout) operator, but this usage is uncommon.

## Advanced: Skipping the calculation of parameter gradients

Expand Down Expand Up @@ -196,7 +196,7 @@ print(x.grad)

## Advanced: Using Python control flow

As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](https://mxnet.apache.org/versions/master/tutorials/control_flow/ControlFlowTutorial.html).
As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](/api/python/docs/tutorials/packages/autograd/index.html#Advanced:-Using-Python-control-flow).

We'll write a function as a toy example of a dynamic network. We'll add an `if` condition and a loop with a variable number of iterations, both of which will depend on the input data. Although these can now be used in static graphs (with conditional operators) it's still much more natural to use native control flow.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The only instance method needed to be implemented is [forward(self, x)](https://
In the example below, we define a new layer and implement `forward()` method to normalize input data by fitting it into a range of [0, 1].

```python
# Do some initial imports used throughout this tutorial
# Do some initial imports used throughout this tutorial
from __future__ import print_function
import mxnet as mx
from mxnet import nd, gluon, autograd
Expand All @@ -53,7 +53,7 @@ The rest of methods of the `Block` class are already implemented, and majority o

Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.

The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/versions/master/architecture/program_model.html).
The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from this [deep learning programming paradigm](/api/architecture/program_model) article.

Hybridization is a process that Apache MxNet uses to create a symbolic graph of a forward computation. This allows to increase computation performance by optimizing the computational symbolic graph. Once the symbolic graph is created, Apache MxNet caches and reuses it for subsequent computations.

Expand Down Expand Up @@ -143,7 +143,7 @@ class NormalizationHybridLayer(gluon.HybridBlock):
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy()),
differentiable=False)

def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
Expand Down Expand Up @@ -175,14 +175,14 @@ def print_params(title, net):
"""
print(title)
hybridlayer_params = {k: v for k, v in net.collect_params().items() if 'normalizationhybridlayer' in k }

for key, value in hybridlayer_params.items():
print('{} = {}\n'.format(key, value.data()))

net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons

Expand All @@ -195,15 +195,15 @@ label = nd.random_uniform(low=-1, high=1, shape=(5, 1))

mse_loss = gluon.loss.L2Loss() # Mean squared error between output and label
trainer = gluon.Trainer(net.collect_params(), # Init trainer with Stochastic Gradient Descent (sgd) optimization method and parameters for it
'sgd',
'sgd',
{'learning_rate': 0.1, 'momentum': 0.9 })
with autograd.record(): # Autograd records computations done on NDArrays inside "with" block

with autograd.record(): # Autograd records computations done on NDArrays inside "with" block
output = net(input) # Run forward propogation
print_params("=========== Parameters after forward pass ===========\n", net)

print_params("=========== Parameters after forward pass ===========\n", net)
loss = mse_loss(output, label) # Calculate MSE

loss.backward() # Backward computes gradients and stores them as a separate array within each NDArray in .grad field
trainer.step(input.shape[0]) # Trainer updates parameters of every block, using .grad field using oprimization method (sgd in this example)
# We provide batch size that is used as a divider in cost function formula
Expand All @@ -213,29 +213,29 @@ print_params("=========== Parameters after backward pass ===========\n", net)
```python
=========== Parameters after forward pass ===========

hybridsequential94_normalizationhybridlayer0_weights =
hybridsequential94_normalizationhybridlayer0_weights =
[[-0.3983642 -0.505708 -0.02425683 -0.3133553 -0.35161012]
[ 0.6467543 0.3918715 -0.6154656 -0.20702496 -0.4243446 ]
[ 0.6077331 0.03922009 0.13425875 0.5729856 -0.14446527]
[-0.3572498 0.18545026 -0.09098256 0.5106366 -0.35151464]
[-0.39846328 0.22245121 0.13075739 0.33387476 -0.10088372]]
<NDArray 5x5 @cpu(0)>

hybridsequential94_normalizationhybridlayer0_scales =
hybridsequential94_normalizationhybridlayer0_scales =
[2.]
<NDArray 1 @cpu(0)>

=========== Parameters after backward pass ===========

hybridsequential94_normalizationhybridlayer0_weights =
hybridsequential94_normalizationhybridlayer0_weights =
[[-0.29839832 -0.47213346 0.08348035 -0.2324698 -0.27368504]
[ 0.76268613 0.43080837 -0.49052125 -0.11322092 -0.3339738 ]
[ 0.48665082 -0.00144657 0.00376363 0.47501418 -0.23885089]
[-0.22626656 0.22944227 0.05018325 0.6166192 -0.24941102]
[-0.44946212 0.20532274 0.07579394 0.29261002 -0.14063817]]
<NDArray 5x5 @cpu(0)>

hybridsequential94_normalizationhybridlayer0_scales =
hybridsequential94_normalizationhybridlayer0_scales =
[2.]
<NDArray 1 @cpu(0)>
```
Expand Down
Loading