apache · aaronmarkham · Oct 17, 2019 · Oct 15, 2019 · Oct 16, 2019 · Oct 16, 2019
@@ -28,7 +28,7 @@ In this tutorial, we will learn how to use MXNet to ONNX exporter on pre-trained
 ## Prerequisites
 
 To run the tutorial you will need to have installed the following python modules:
-To run the tutorial you will need to have installed the following python modules:
+To run the tutorial, install the following Python modules:
-To run the tutorial you will need to have installed the following python modules:
+To run the tutorial, install the following Python modules:
-- [MXNet >= 1.3.0](http://mxnet.apache.org/install/index.html)
+- [MXNet >= 1.3.0](/get_started)
 - [onnx]( https://github.com/onnx/onnx#installation) v1.2.1 (follow the install guide)
 
 *Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
-*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
+*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set, which comes with ONNX v1.2.1.
-*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
+*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set, which comes with ONNX v1.2.1.
@@ -147,4 +147,4 @@ checker.check_graph(model_proto.graph)
 
 If the converted protobuf format doesn't qualify to ONNX proto specifications, the checker will throw errors, but in this case it successfully passes. 
 
-This method confirms exported model protobuf is valid. Now, the model is ready to be imported in other frameworks for inference!
+This method confirms exported model protobuf is valid. Now, the model is ready to be imported in other frameworks for inference!
@@ -26,80 +26,8 @@ learning models. Using AWS, we can rapidly fire up multiple machines
 with multiple GPUs each at will and maintain the resources for precisely
 the amount of time needed.
 
-Set Up an AWS GPU Cluster from Scratch
---------------------------------------
-
-In this document, we provide a step-by-step guide that will teach you
-how to set up an AWS cluster with *MXNet*. We show how to:
-
--  ``Use Amazon S3 to host data``\ \_
--  ``Set up an EC2 GPU instance with all dependencies installed``\ \_
--  ``Build and run MXNet on a single computer``\ \_
--  ``Set up an EC2 GPU cluster for distributed training``\ \_
-
-Use Amazon S3 to Host Data
-:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~`````````\ ~`\ ~~
-
-Amazon S3 provides distributed data storage which proves especially
-convenient for hosting large datasets. To use S3, you need
-``AWS credentials``\ \_, including an ``ACCESS_KEY_ID`` and a
-``SECRET_ACCESS_KEY``.
-
-To use *MXNet* with S3, set the environment variables
-``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` by adding the
-following two lines in ``~/.bashrc`` (replacing the strings with the
-correct ones):
-
-.. code:: bash
-
-export AWS\_ACCESS\_KEY\_ID=AKIAIOSFODNN7EXAMPLE export
-AWS\_SECRET\_ACCESS\_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-
-There are several ways to upload data to S3. One simple way is to use
-``s3cmd``\ \_. For example:
-
-.. code:: bash
-
-wget http://data.mxnet.io/mxnet/data/mnist.zip unzip mnist.zip && s3cmd
-put t\*-ubyte s3://dmlc/mnist/
-
-Use Pre-installed EC2 GPU Instance
-:sub:`:sub:`~`\ :sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~`````````````\ ~~
-
-The ``Deep Learning AMI``\ \_ is an Amazon Linux image supported and
-maintained by Amazon Web Services for use on Amazon Elastic Compute
-Cloud (Amazon EC2). It contains ``MXNet-v0.9.3 tag``\ \_ and the
-necessary components to get going with deep learning, including Nvidia
-drivers, CUDA, cuDNN, Anaconda, Python2 and Python3. The AMI IDs are the
-following:
-
--  us-east-1: ami-e7c96af1
--  us-west-2: ami-dfb13ebf
--  eu-west-1: ami-6e5d6808
-
-Now you can launch *MXNet* directly on an EC2 GPU instance. You can also
-use ``Jupyter``\ \_ notebook on EC2 machine. Here is a
-``good tutorial``\ \_ on how to connect to a Jupyter notebook running on
-an EC2 instance.
-
-Set Up an EC2 GPU Instance from Scratch
-:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~``````\ :sub:`:sub:`:sub:`:sub:`:sub:`:sub:`:sub:`~```````\ :sub:`:sub:`~```
-
-*MXNet* requires the following libraries:
-
--  C++ compiler with C++11 support, such as ``gcc >= 4.8``
--  ``CUDA`` (``CUDNN`` in optional) for GPU linear algebra
--  ``BLAS`` (cblas, open-blas, atblas, mkl, or others)
-
-.. \_Use Amazon S3 to host data: #use-amazon-s3-to-host-data .. \_Set up
-an EC2 GPU instance with all dependencies installed:
-#set-up-an-ec2-gpu-instance .. \_Build and run MXNet on a single
-computer: #build-and-run-mxnet-on-a-gpu-instance .. \_Set up an EC2 GPU
-cluster for distributed training:
-#set-up-an-ec2-gpu-cluster-for-distributed-training .. \_AWS
-credentials:
-http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html
-.. \_s3cmd: http://s3tools.org/s3cmd .. *Deep Learning AMI:
-https://aws.amazon.com/marketplace/pp/B01M0AXXQB?qid=1475211685369&sr=0-1&ref*\ =srh\_res\_product\_title
-.. \_MXNet-v0.9.3 tag: https://github.com/apache/incubator-mxnet .. \_Jupyter:
-http://jupyter.org
+Here are some ways you can use MXNet on AWS:
+1. Use [Amazon SageMaker](https://aws.amazon.com/sagemaker/developer-resources/)
+1. Use the [AWS Deep Learning AMI with Conda](https://docs.aws.amazon.com/dlami/latest/devguide/overview-conda.html) (comes preinstalled!)
+1. Use an [AWS Deep Learning Container](https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers.html)
+1. Install MXNet on a [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
-1. Install MXNet on a [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
+1. Install MXNet on an [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
-1. Install MXNet on a [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
+1. Install MXNet on an [AWS Deep Learning Base AMI](https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html)
@@ -42,7 +42,7 @@ The following tutorials will help you learn how to deploy MXNet on various AWS p
 
    .. card::
       :title: Training with Data from S3
-      :link: https://mxnet.apache.org/versions/master/faq/s3_integration.html
+      :link: /api/faq/s3_integration
 
       How to train with data from Amazon S3 buckets.
 

@@ -77,7 +77,7 @@ from mxnet.gluon.data.vision import transforms
 from mxnet.gluon.model_zoo.vision import resnet50_v2
 ```
 
-Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](https://mxnet.apache.org/tutorials/gluon/learning_rate_schedules.html) to adjust learning rates during training.
+Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](../packages/gluon/training/learning_rates/learning_rate_schedules.html) to adjust learning rates during training.
 Here we set the `epochs` to 1 for quick demonstration, please change to 40 for actual training.
 
 ```python
@@ -324,4 +324,4 @@ You can also find more ways to run inference and deploy your models here:
 2. [Gluon book on fine-tuning](https://www.d2l.ai/chapter_computer-vision/fine-tuning.html)
 3. [Gluon CV transfer learning tutorial](https://gluon-cv.mxnet.io/build/examples_classification/transfer_learning_minc.html)
 4. [Gluon crash course](https://gluon-crash-course.mxnet.io/)
-5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
+5. [Gluon CPP inference example](https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/)
@@ -31,7 +31,7 @@ Comparison Guides
 
    .. card::
       :title: Caffe to MXNet
-      :link: https://mxnet.apache.org/versions/master/faq/caffe.html
+      :link: /api/faq/caffe.html
 
       How to convert Caffe models to MXNet and how to call Caffe operators from MXNet.
 

@@ -29,15 +29,15 @@ Gradients are fundamental to the process of training neural networks, and tell u
 
 Under the hood, neural networks are composed of operators (e.g. sums, products, convolutions, etc) some of which use parameters (e.g. the weights in convolution kernels) for their computation, and it's our job to find the optimal values for these parameters. Gradients lead us to the solution!
 
-Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.L2Loss.html?highlight=l2#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.loss.SoftmaxCrossEntropyLoss.html).
+Gradients tell us how much a given variable increases or decreases when we change a variable it depends on. What we're interested in is the effect of changing a each parameter on the performance of the network. We usually define performance using a loss metric that we try to minimize, i.e. a metric that tells us how bad the predictions of a network are given ground truth. As an example, for regression we might try to minimize the [L2 loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.L2Loss) (also known as the Euclidean distance) between our predictions and true values, and for classification we minimize the [cross entropy loss](/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.SoftmaxCrossEntropyLoss).
 
-Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](http://beta.mxnet.io/api/gluon-related/mxnet.optimizer.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.
+Assuming we've calculated the gradient of each parameter with respect to the loss (details in next section), we can then use an optimizer such as [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to shift the parameters slightly in the *opposite direction* of the gradient. See [Optimizers](/api/python/docs/api/optimizer/index.html) for more information on these methods. We repeat the process of calculating gradients and updating parameters over and over again, until the parameters of the network start to stabilize and converge to a good solution.
 
 ## How do we calculate gradients?
 
 ### Short Answer:
 
-We differentiate. [MXNet Gluon](http://beta.mxnet.io/api/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.
+We differentiate. [MXNet Gluon](/api/python/docs/tutorials/packages/gluon/index.html) uses Reverse Mode Automatic Differentiation (`autograd`) to backprogate gradients from the loss metric to the network parameters.
 
 ![forward-backward](/_static/autograd/autograd_forward_backward.png)
 
@@ -159,7 +159,7 @@ print('is_training:', is_training, output)
 
 We called `dropout` while `autograd` was recording this time, so our network was in training mode and we see dropout of the input this time. Since the probability of dropout was 50%, the output is automatically scaled by 1/0.5=2 to preserve the average activation.
 
-We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=dropout#mxnet.ndarray.Dropout) operator, but this usage is uncommon.
+We can force some operators to behave as they would during training, even in inference mode. One example is setting `mode='always'` on the [Dropout](/api/python/ndarray/ndarray.html#mxnet.ndarray.Dropout) operator, but this usage is uncommon.
 
 ## Advanced: Skipping the calculation of parameter gradients
 
@@ -196,7 +196,7 @@ print(x.grad)
 
 ## Advanced: Using Python control flow
 
-As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](https://mxnet.apache.org/versions/master/tutorials/control_flow/ControlFlowTutorial.html).
+As mentioned before, one of the main advantages of `autograd` is the ability to automatically calculate gradients of dynamic graphs (i.e. graphs where the operators could be different on every forward pass). One example of this would be applying a tree structured recurrent network to parse a sentence using its parse tree. And we can use Python control flow operators to create a dynamic flow that depends on the data, rather than using [MXNet's control flow operators](/api/python/docs/tutorials/packages/autograd/index.html#Advanced:-Using-Python-control-flow).
 
 We'll write a function as a toy example of a dynamic network. We'll add an `if` condition and a loop with a variable number of iterations, both of which will depend on the input data. Although these can now be used in static graphs (with conditional operators) it's still much more natural to use native control flow.
 

@@ -30,7 +30,7 @@ The only instance method needed to be implemented is [forward(self, x)](https://
 In the example below, we define a new layer and implement `forward()`  method to normalize input data by fitting it into a range of [0, 1].
 
 ```python
-# Do some initial imports used throughout this tutorial 
+# Do some initial imports used throughout this tutorial
 from __future__ import print_function
 import mxnet as mx
 from mxnet import nd, gluon, autograd
@@ -53,7 +53,7 @@ The rest of methods of the `Block` class are already implemented, and majority o
 
 Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.
 
-The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/versions/master/architecture/program_model.html).
+The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convenient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from this [deep learning programming paradigm](/api/architecture/program_model) article.
 
 Hybridization is a process that Apache MxNet uses to create a symbolic graph of a forward computation. This allows to increase computation performance by optimizing the computational symbolic graph. Once the symbolic graph is created, Apache MxNet caches and reuses it for subsequent computations.
 
@@ -143,7 +143,7 @@ class NormalizationHybridLayer(gluon.HybridBlock):
                                       shape=scales.shape,
                                       init=mx.init.Constant(scales.asnumpy()),
                                       differentiable=False)
-            
+
     def hybrid_forward(self, F, x, weights, scales):
         normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
         weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
@@ -175,14 +175,14 @@ def print_params(title, net):
     """
     print(title)
     hybridlayer_params = {k: v for k, v in net.collect_params().items() if 'normalizationhybridlayer' in k }
-    
+
     for key, value in hybridlayer_params.items():
         print('{} = {}\n'.format(key, value.data()))
 
 net = gluon.nn.HybridSequential()                             # Define a Neural Network as a sequence of hybrid blocks
 with net.name_scope():                                        # Used to disambiguate saving and loading net parameters
     net.add(Dense(5))                                         # Add Dense layer with 5 neurons
-    net.add(NormalizationHybridLayer(hidden_units=5, 
+    net.add(NormalizationHybridLayer(hidden_units=5,
                                      scales = nd.array([2]))) # Add our custom layer
     net.add(Dense(1))                                         # Add Dense layer with 1 neurons
 
@@ -195,15 +195,15 @@ label = nd.random_uniform(low=-1, high=1, shape=(5, 1))
 
 mse_loss = gluon.loss.L2Loss()                                # Mean squared error between output and label
 trainer = gluon.Trainer(net.collect_params(),                 # Init trainer with Stochastic Gradient Descent (sgd) optimization method and parameters for it
-                        'sgd', 
+                        'sgd',
                         {'learning_rate': 0.1, 'momentum': 0.9 })
-                        
-with autograd.record():                                       # Autograd records computations done on NDArrays inside "with" block 
+
+with autograd.record():                                       # Autograd records computations done on NDArrays inside "with" block
     output = net(input)                                       # Run forward propogation
-    
-    print_params("=========== Parameters after forward pass ===========\n", net)    
+
+    print_params("=========== Parameters after forward pass ===========\n", net)
     loss = mse_loss(output, label)                            # Calculate MSE
-    
+
 loss.backward()                                               # Backward computes gradients and stores them as a separate array within each NDArray in .grad field
 trainer.step(input.shape[0])                                  # Trainer updates parameters of every block, using .grad field using oprimization method (sgd in this example)
                                                               # We provide batch size that is used as a divider in cost function formula
@@ -213,29 +213,29 @@ print_params("=========== Parameters after backward pass ===========\n", net)
 ```python
 =========== Parameters after forward pass ===========
 
-hybridsequential94_normalizationhybridlayer0_weights = 
+hybridsequential94_normalizationhybridlayer0_weights =
 [[-0.3983642  -0.505708   -0.02425683 -0.3133553  -0.35161012]
  [ 0.6467543   0.3918715  -0.6154656  -0.20702496 -0.4243446 ]
  [ 0.6077331   0.03922009  0.13425875  0.5729856  -0.14446527]
  [-0.3572498   0.18545026 -0.09098256  0.5106366  -0.35151464]
  [-0.39846328  0.22245121  0.13075739  0.33387476 -0.10088372]]
 <NDArray 5x5 @cpu(0)>
 
-hybridsequential94_normalizationhybridlayer0_scales = 
+hybridsequential94_normalizationhybridlayer0_scales =
 [2.]
 <NDArray 1 @cpu(0)>
 
 =========== Parameters after backward pass ===========
 
-hybridsequential94_normalizationhybridlayer0_weights = 
+hybridsequential94_normalizationhybridlayer0_weights =
 [[-0.29839832 -0.47213346  0.08348035 -0.2324698  -0.27368504]
  [ 0.76268613  0.43080837 -0.49052125 -0.11322092 -0.3339738 ]
  [ 0.48665082 -0.00144657  0.00376363  0.47501418 -0.23885089]
  [-0.22626656  0.22944227  0.05018325  0.6166192  -0.24941102]
  [-0.44946212  0.20532274  0.07579394  0.29261002 -0.14063817]]
 <NDArray 5x5 @cpu(0)>
 
-hybridsequential94_normalizationhybridlayer0_scales = 
+hybridsequential94_normalizationhybridlayer0_scales =
 [2.]
 <NDArray 1 @cpu(0)>
 ```