From 97c41e491d59d6c65b400a7a671bd19be52f8aa5 Mon Sep 17 00:00:00 2001
From: "Delteil, Thomas" <tdelteil@amazon.com>
Date: Wed, 12 Sep 2018 18:14:17 -0700
Subject: [PATCH 01/11] Adding tutorial module to gluon

---
 docs/tutorials/index.md                  |   7 +-
 docs/tutorials/python/module_to_gluon.md | 295 +++++++++++++++++++++++
 2 files changed, 299 insertions(+), 3 deletions(-)
 create mode 100644 docs/tutorials/python/module_to_gluon.md
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 8a6ac4081c04..93372be6bea8 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -67,7 +67,8 @@ Select API:&nbsp;
     * [Learning Rate Finder](/tutorials/gluon/learning_rate_finder.html)
     * [Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules.html)
     * [Advanced Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules_advanced.html)
-    * [Profiling MXNet Models](/tutorials/python/profiler.html)
+    * [Profiling MXNet Models](/tutorials/python/profiler.html)    
+    * [Module to Gluon API](/tutorials/python/module_to_gluon.html)<span style="color:red"> (new!)<span></span></span>
 * API Guides
     * Core APIs
         * NDArray
@@ -81,7 +82,7 @@ Select API:&nbsp;
         * Symbol
             * [Symbol API](/tutorials/basic/symbol.html) (Caution: written before Gluon existed)
         * KVStore
-            * [Key-Value Store API](/tutorials/python/kvstore.html)
+            * [Key-Value Store API](/tutorials/python/kvstore.html)        
     * Gluon APIs
         * Blocks and Operators
             * [Blocks](/tutorials/gluon/gluon.html) ([Alternative](http://gluon.mxnet.io/chapter03_deep-neural-networks/plumbing.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>)
@@ -89,6 +90,7 @@ Select API:&nbsp;
             * [HybridBlocks](/tutorials/gluon/hybrid.html) ([Alternative](http://gluon.mxnet.io/chapter07_distributed-learning/hybridize.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>)
             * [Block Naming](/tutorials/gluon/naming.html)
             * [Custom Operators](/tutorials/gluon/customop.html)
+            * [Control Flow operators](/tutorials/control_flow/ControlFlowTutorial.html)<span style="color:red"> (new!)<span></span></span>
         * Autograd
             * [AutoGrad API](/tutorials/gluon/autograd.html)
             * [AutoGrad API with chain rule](http://gluon.mxnet.io/chapter01_crashcourse/autograd.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>
@@ -117,7 +119,6 @@ Select API:&nbsp;
     * [Fine-Tuning a pre-trained ImageNet model with a new dataset](/faq/finetune.html)
     * [Large-Scale Multi-Host Multi-GPU Image Classification](/tutorials/vision/large_scale_classification.html)
     * [Importing an ONNX model into MXNet](/tutorials/onnx/super_resolution.html)
-    * [Hybridize Gluon models with control flows](/tutorials/control_flow/ControlFlowTutorial.html)
 * API Guides
     * Core APIs
         * NDArray
diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
new file mode 100644
index 000000000000..0ec10dcc118c
--- /dev/null
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -0,0 +1,295 @@
+
+# Converting Module API code to the Gluon API
+
+Sometimes, you find yourself in the situation where the model you want to use has been written using the symbolic Module API rather than the imperative Gluon API. In this tutorial, we will give you a comprehensive guide you can use in order to convert a given model to use Gluon.
+
+The different element to take in consideration are:
+
+I) Data loading
+
+II) Model definition
+
+III) Loss
+
+IV) Training Loop
+
+V) Exporting Models
+
+In the following section we will look at 1:1 mapping between the Module and the Gluon ways.
+
+## I - Data Loading
+
+
+```python
+import logging
+logging.basicConfig(level=logging.INFO)
+
+import numpy as np
+import mxnet as mx
+from mxnet.gluon.data import ArrayDataset, DataLoader
+from mxnet.gluon import nn
+from mxnet import gluon
+
+batch_size = 5
+dataset_length = 200
+```
+
+#### Module
+
+When using the Module API we use a [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter), in addition to the data itself, the [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) contains information about the name of the input symbols.
+
+Let's create some random data, following the same format as grayscale 28x28 images.
+
+
+```python
+train_data = np.random.rand(dataset_length, 28,28).astype('float32')
+train_label = np.random.randint(0, 10, (dataset_length,)).astype('float32')
+```
+
+
+```python
+data_iter = mx.io.NDArrayIter(data=train_data, label=train_label, batch_size=batch_size, shuffle=False, data_name='data', label_name='softmax_label')
+for batch in data_iter:
+    print(batch.data[0].shape, batch.label[0])
+    break;
+```
+
+    (5, 28, 28) 
+    [5. 0. 3. 4. 9.]
+    <NDArray 5 @cpu(0)>
+
+
+#### Gluon
+
+With Gluon, the preferred method is to use a [`DataLoader`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataloader#mxnet.gluon.data.DataLoader) that make use of a [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) to prefetch asynchronously the data.
+
+
+```python
+dataset = ArrayDataset(train_data, train_label)
+dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=0)
+for data, label in dataloader:
+    print(data.shape, label)
+    break
+```
+
+    (5, 28, 28) 
+    [5. 0. 3. 4. 9.]
+    <NDArray 5 @cpu(0)>
+
+
+#### Notable differences
+
+- Gluon keeps a strict separation between data holding, and data loading / fetching. The `Dataset` role is to hold onto some data, in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing workers. This flexible API allows to efficiently pre-fetch data and separate the concerns. 
+- In the module API, `DataIter`s are responsible for both holding the data and iterating through it. Some `DataIter` support multi-threading like the [`ImageRecordIter`](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter), while other don't like the `NDArrayIter`.
+
+You can checkout the [`Dataset` and `DataLoader` tutorial](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html). You can either rewrite your code in order to use one of the provided [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) class, like the [`ArrayDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=arraydataset#mxnet.gluon.data.ArrayDataset) or the [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset), or you can simply wrap your existing [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) to have a similar usage pattern as a `DataLoader`:
+
+
+```python
+class DataIterLoader():
+    def __init__(self, data_iter):
+        self.data_iter = data_iter
+
+    def __iter__(self):
+        self.data_iter.reset()
+        return self
+
+    def __next__(self):
+        batch = self.data_iter.__next__()
+        assert len(batch.data) == len(batch.label) == 1
+        data = batch.data[0]
+        label = batch.label[0]
+        return data, label
+
+    def next(self):
+        return self.__next__() # for Python 2
+```
+
+
+```python
+data_iter = mx.io.NDArrayIter(data=train_data, label=train_label, batch_size=batch_size)
+data_iter_loader = DataIterLoader(data_iter)
+for data, label in data_iter_loader:
+    print(data.shape, label)
+    break
+```
+
+    (5, 28, 28) 
+    [5. 0. 3. 4. 9.]
+    <NDArray 5 @cpu(0)>
+
+
+## II - Model definition
+
+Let's look at the model definition from the [MNIST Module Tutorial](https://mxnet.incubator.apache.org/tutorials/python/mnist.html):
+
+
+```python
+ctx = mx.gpu()
+```
+
+#### Module
+
+
+```python
+data = mx.sym.var('data')
+data = mx.sym.flatten(data=data)
+fc1  = mx.sym.FullyConnected(data=data, num_hidden=128)
+act1 = mx.sym.Activation(data=fc1, act_type="relu")
+fc2  = mx.sym.FullyConnected(data=act1, num_hidden = 64)
+act2 = mx.sym.Activation(data=fc2, act_type="relu")
+fc3  = mx.sym.FullyConnected(data=act2, num_hidden=10)
+mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
+
+# Bind model to Module
+mlp_model = mx.mod.Module(symbol=mlp, context=ctx, data_names=['data'], label_names=['softmax_label'])
+```
+
+#### Gluon
+
+In Gluon, for a sequential model like that, you would create a `Sequential` block, in that case a `HybridSequential` block to allow for future hybridization since we are only using hybridizable blocks. Learn more [about hybridization](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html).
+
+
+```python
+net = nn.HybridSequential()
+with net.name_scope():
+    net.add(
+        nn.Flatten(),
+        nn.Dense(units=128, activation="relu"),
+        nn.Dense(units=64, activation="relu"),
+        nn.Dense(units=10)
+    )
+```
+
+## III - Loss
+
+The loss, that you are trying to minimize using an optimization algorithm like SGD, is defined differently in the Module API and in Gluon.
+
+In the module API, the loss is part of the network. It has usually a forward result, that is the inference value, and a backward pass that is the gradient of the output with respect to that particular loss.
+
+For example the [sym.SoftmaxOutput](https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=softmaxout#mxnet.symbol.SoftmaxOutput) is a softmax output in the forward pass and the gradient with respect to the cross-entropy loss in the backward pass.
+
+In Gluon, it is a lot more transparent. Losses, like the [SoftmaxCrossEntropyLoss](https://mxnet.incubator.apache.org/api/python/gluon/loss.html?highlight=softmaxcross#mxnet.gluon.loss.SoftmaxCrossEntropyLoss), are only computing the actual value of the loss. You then call `.backward()` on the loss value to compute the gradient of the parameters with respect to that loss. At inference time, you simply call `.softmax()` on your output to get the output of your network normalized according to the softmax function.
+
+#### Module
+
+
+```python
+# Softmax with cross entropy loss, directly part of the network
+mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
+```
+
+#### Gluon
+
+
+```python
+# We simply create a loss function we will use in our training loop
+loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
+```
+
+## IV - Training Loop
+
+The Module API provides a [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) functions that takes care of fitting training data to your symbolic model. With Gluon, you execution flow controls the data flow, so you need to write your own loop. It might seems like it is more verbose, but you have a lot more control as to what is happening during the training. 
+With the [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function, you control the metric reporting, checkpointing, through a lot of different keyword arguments (check the [docs](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit)). That is where you define the optimizer for example.
+
+With Gluon, you do these operations directly in the training loop, and the optimizer is part of the [`Trainer`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=trainer#mxnet.gluon.Trainer) object that handles the weight updates of your parameters.
+
+#### Module
+
+
+```python
+mlp_model.fit(data_iter,  # train data
+              eval_data=data_iter,  # validation data
+              optimizer='adam',  # use SGD to train
+              force_init=True,
+              force_rebind=True,
+              optimizer_params={'learning_rate':0.1},  # use fixed learning rate
+              eval_metric='acc',  # report accuracy during training
+              num_epoch=5)  # train for at most 10 dataset passes
+```
+
+```INFO:root:Epoch[4] Train-accuracy=0.070000```<!--notebook-skip-line-->
+
+```INFO:root:Epoch[4] Time cost=0.038```<!--notebook-skip-line-->
+
+```INFO:root:Epoch[4] Validation-accuracy=0.125000```<!--notebook-skip-line-->
+
+#### Gluon
+
+
+```python
+# Initialize network and trainer
+net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
+trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
+
+# Pick a metric
+metric = mx.metric.Accuracy()
+
+for e in range(5): # start of epoch
+    
+    for data, label in dataloader: # start of mini-batch
+        data = data.as_in_context(ctx)
+        label = label.as_in_context(ctx)
+        
+        with mx.autograd.record():
+            output = net(data) # forward pass
+            loss = loss_fn(output, label) # get loss
+            loss.backward() # compute gradients
+        
+        trainer.step(data.shape[0]) # update weights with SGD
+        metric.update(label, output) # update the metrics
+        # end of mini-batch
+    name, acc = metric.get()
+    print('training metrics at epoch %d: %s=%f'%(e, name, acc))
+    metric.reset()
+    # end of epoch
+```
+
+```training metrics at epoch 3: accuracy=0.155000```<!--notebook-skip-line-->
+
+```training metrics at epoch 4: accuracy=0.145000```<!--notebook-skip-line-->
+
+
+## V - Exporting model
+
+The ultimate purpose of training a model is to be able to export it and share it, whether it is for deployment or simply reproducibility purposes.
+
+With the Module API, you can save model using the [`.save_checkpoint()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=save_chec#mxnet.module.Module.save_checkpoint) and get a `-symbol.json` and a `.params` file that represent your network. 
+
+With Gluon, network parameters are associated with a `Block`, but the execution flow is controlled in python through the code in `.forward()` function. Hence only [hybridized networks]() can be exported with a `-symbol.json` and `.params` file using [`.export()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=export#mxnet.gluon.HybridBlock.export), non-hybridized models can only have their parameters exported using [`.save_parameters()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=save_pa#mxnet.gluon.Block.save_parameters). Check this great tutorial to learn more: [Saving and Loading Gluon Models](https://mxnet.incubator.apache.org/tutorials/gluon/save_load_params.html).
+
+#### Module
+
+
+```python
+mlp_model.save_checkpoint('module-model', epoch=5)
+# nodule-model-0005.params module-model-symbol.json
+```
+
+```INFO:root:Saved checkpoint to "module-model-0005.params"```<!--notebook-skip-line-->
+
+#### Gluon
+
+
+```python
+# save only the parameters
+net.save_parameters('gluon-model.params')
+# gluon-model.params
+```
+
+
+```python
+# save the parameters and the symbolic representation
+net.hybridize()
+net(mx.nd.ones((1,1,28,28), ctx))
+
+net.export('gluon-model-hybrid', epoch=5)
+# gluon-model-hybrid-symbol.json gluon-model-hybrid-0005.params
+```
+
+## Conclusion
+
+This tutorial lead you through the steps necessary to train a deep learning model and showed you the differene between the symbolic approach of the Module API and the imperative Gluon API. If you need help converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
+
+
+<!-- INSERT SOURCE DOWNLOAD BUTTONS -->
\ No newline at end of file

From 9d3eb32311702b967f7f1810fb7d0fc9a9eb6564 Mon Sep 17 00:00:00 2001
From: "Delteil, Thomas" <tdelteil@amazon.com>
Date: Wed, 12 Sep 2018 18:20:27 -0700
Subject: [PATCH 02/11] update test

---
 tests/tutorials/test_tutorials.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/tutorials/test_tutorials.py b/tests/tutorials/test_tutorials.py
index a2442a4f6a06..6c6b174e8de9 100644
--- a/tests/tutorials/test_tutorials.py
+++ b/tests/tutorials/test_tutorials.py
@@ -160,6 +160,9 @@ def test_python_data_augmentation_with_masks():
 def test_python_kvstore():
     assert _test_tutorial_nb('python/kvstore')
 
+def test_module_to_gluon():
+    assert _test_tutorial_nb('python/module_to_gluon')
+
 def test_python_types_of_data_augmentation():
     assert _test_tutorial_nb('python/types_of_data_augmentation')
 
@@ -189,3 +192,4 @@ def test_vision_cnn_visualization():
 
 def test_control_flow():
     assert _test_tutorial_nb('control_flow/ControlFlowTutorial')
+

From 012f495d7107a7159cd60e32efdfdf76bf98e597 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Wed, 12 Sep 2018 18:29:49 -0700
Subject: [PATCH 03/11] update wording and typos

---
 docs/tutorials/python/module_to_gluon.md | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
index 0ec10dcc118c..085fb3279f78 100644
--- a/docs/tutorials/python/module_to_gluon.md
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -1,9 +1,9 @@
 
 # Converting Module API code to the Gluon API
 
-Sometimes, you find yourself in the situation where the model you want to use has been written using the symbolic Module API rather than the imperative Gluon API. In this tutorial, we will give you a comprehensive guide you can use in order to convert a given model to use Gluon.
+Sometimes, you find yourself in the situation where the model you want to use has been written using the symbolic Module API rather than the simpler, easier-to-debug, more flexible, imperative Gluon API. In this tutorial, we will give you a comprehensive guide you can use in order to see how you can transform your Module code, to work with the Gluon API.
 
-The different element to take in consideration are:
+The different steps to take into consideration are:
 
 I) Data loading
 
@@ -15,7 +15,7 @@ IV) Training Loop
 
 V) Exporting Models
 
-In the following section we will look at 1:1 mapping between the Module and the Gluon ways.
+In the following section we will look at 1:1 mappings between the Module and the Gluon ways of training a neural networks.
 
 ## I - Data Loading
 
@@ -130,6 +130,8 @@ ctx = mx.gpu()
 
 #### Module
 
+For the Module API, you define the data flow by setting `data` keyword argument of one layer to the next.
+You then bind the symbolic model to a specific compute context and specify the symbol names for the data and the label.
 
 ```python
 data = mx.sym.var('data')
@@ -147,7 +149,7 @@ mlp_model = mx.mod.Module(symbol=mlp, context=ctx, data_names=['data'], label_na
 
 #### Gluon
 
-In Gluon, for a sequential model like that, you would create a `Sequential` block, in that case a `HybridSequential` block to allow for future hybridization since we are only using hybridizable blocks. Learn more [about hybridization](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html).
+In Gluon, for a sequential model like that, you would create a `Sequential` block, in that case a `HybridSequential` block to allow for future hybridization since we are only using hybridizable blocks. Learn more [about hybridization](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html). The flow of the data will be automatically set from one layer to the next, since they are held in a `Sequential` block.
 
 
 ```python
@@ -163,9 +165,9 @@ with net.name_scope():
 
 ## III - Loss
 
-The loss, that you are trying to minimize using an optimization algorithm like SGD, is defined differently in the Module API and in Gluon.
+The loss, that you are trying to minimize using an optimization algorithm like SGD, is defined differently in the Module API than in Gluon.
 
-In the module API, the loss is part of the network. It has usually a forward result, that is the inference value, and a backward pass that is the gradient of the output with respect to that particular loss.
+In the module API, the loss is part of the network. It has usually a forward pass result, that is the inference value, and a backward pass that is the gradient of the output with respect to that particular loss.
 
 For example the [sym.SoftmaxOutput](https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=softmaxout#mxnet.symbol.SoftmaxOutput) is a softmax output in the forward pass and the gradient with respect to the cross-entropy loss in the backward pass.
 
@@ -289,7 +291,7 @@ net.export('gluon-model-hybrid', epoch=5)
 
 ## Conclusion
 
-This tutorial lead you through the steps necessary to train a deep learning model and showed you the differene between the symbolic approach of the Module API and the imperative Gluon API. If you need help converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
+This tutorial lead you through the steps necessary to train a deep learning model and showed you the difference between the symbolic approach of the Module API and the imperative one of the Gluon API. If you need more help converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
 
 
-<!-- INSERT SOURCE DOWNLOAD BUTTONS -->
\ No newline at end of file
+<!-- INSERT SOURCE DOWNLOAD BUTTONS -->

From 70e40e7cbf2ed9441c3c87f94c3c8a223d679e56 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Wed, 12 Sep 2018 18:30:12 -0700
Subject: [PATCH 04/11] Update index.md

---
 docs/tutorials/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 93372be6bea8..dcf0d7080bb8 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -82,7 +82,7 @@ Select API:&nbsp;
         * Symbol
             * [Symbol API](/tutorials/basic/symbol.html) (Caution: written before Gluon existed)
         * KVStore
-            * [Key-Value Store API](/tutorials/python/kvstore.html)        
+            * [Key-Value Store API](/tutorials/python/kvstore.html)
     * Gluon APIs
         * Blocks and Operators
             * [Blocks](/tutorials/gluon/gluon.html) ([Alternative](http://gluon.mxnet.io/chapter03_deep-neural-networks/plumbing.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>)

From 385819241fe48cda2f922a1f067597c49304dc41 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Thu, 13 Sep 2018 10:22:31 -0700
Subject: [PATCH 05/11] trigger build

---
 docs/tutorials/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index dcf0d7080bb8..08f11b663665 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -67,7 +67,7 @@ Select API:&nbsp;
     * [Learning Rate Finder](/tutorials/gluon/learning_rate_finder.html)
     * [Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules.html)
     * [Advanced Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules_advanced.html)
-    * [Profiling MXNet Models](/tutorials/python/profiler.html)    
+    * [Profiling MXNet Models](/tutorials/python/profiler.html)
     * [Module to Gluon API](/tutorials/python/module_to_gluon.html)<span style="color:red"> (new!)<span></span></span>
 * API Guides
     * Core APIs

From c9d96094f2bb88c5dd06470b678523fc018a2cf7 Mon Sep 17 00:00:00 2001
From: "Delteil, Thomas" <tdelteil@amazon.com>
Date: Fri, 14 Sep 2018 15:06:23 -0700
Subject: [PATCH 06/11] update after review

---
 docs/tutorials/python/module_to_gluon.md | 111 ++++++++++++++++++-----
 1 file changed, 86 insertions(+), 25 deletions(-)

diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
index 085fb3279f78..af9dc1bfb355 100644
--- a/docs/tutorials/python/module_to_gluon.md
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -15,12 +15,15 @@ IV) Training Loop
 
 V) Exporting Models
 
+VI) Loading Models for Inference
+
 In the following section we will look at 1:1 mappings between the Module and the Gluon ways of training a neural networks.
 
 ## I - Data Loading
 
 
 ```python
+from collections import namedtuple
 import logging
 logging.basicConfig(level=logging.INFO)
 
@@ -31,7 +34,7 @@ from mxnet.gluon import nn
 from mxnet import gluon
 
 batch_size = 5
-dataset_length = 200
+dataset_length = 50
 ```
 
 #### Module
@@ -51,7 +54,7 @@ train_label = np.random.randint(0, 10, (dataset_length,)).astype('float32')
 data_iter = mx.io.NDArrayIter(data=train_data, label=train_label, batch_size=batch_size, shuffle=False, data_name='data', label_name='softmax_label')
 for batch in data_iter:
     print(batch.data[0].shape, batch.label[0])
-    break;
+    break
 ```
 
     (5, 28, 28) 
@@ -80,7 +83,7 @@ for data, label in dataloader:
 #### Notable differences
 
 - Gluon keeps a strict separation between data holding, and data loading / fetching. The `Dataset` role is to hold onto some data, in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing workers. This flexible API allows to efficiently pre-fetch data and separate the concerns. 
-- In the module API, `DataIter`s are responsible for both holding the data and iterating through it. Some `DataIter` support multi-threading like the [`ImageRecordIter`](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter), while other don't like the `NDArrayIter`.
+- In the module API, `DataIter` are responsible for both holding the data and iterating through it. Some `DataIter` support multi-threading like the [`ImageRecordIter`](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter), while other don't like the [`NDArrayIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=ndarrayiter#mxnet.io.NDArrayIter).
 
 You can checkout the [`Dataset` and `DataLoader` tutorial](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html). You can either rewrite your code in order to use one of the provided [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) class, like the [`ArrayDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=arraydataset#mxnet.gluon.data.ArrayDataset) or the [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset), or you can simply wrap your existing [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) to have a similar usage pattern as a `DataLoader`:
 
@@ -125,7 +128,7 @@ Let's look at the model definition from the [MNIST Module Tutorial](https://mxne
 
 
 ```python
-ctx = mx.gpu()
+ctx = mx.cpu()
 ```
 
 #### Module
@@ -134,15 +137,18 @@ For the Module API, you define the data flow by setting `data` keyword argument
 You then bind the symbolic model to a specific compute context and specify the symbol names for the data and the label.
 
 ```python
-data = mx.sym.var('data')
-data = mx.sym.flatten(data=data)
-fc1  = mx.sym.FullyConnected(data=data, num_hidden=128)
-act1 = mx.sym.Activation(data=fc1, act_type="relu")
-fc2  = mx.sym.FullyConnected(data=act1, num_hidden = 64)
-act2 = mx.sym.Activation(data=fc2, act_type="relu")
-fc3  = mx.sym.FullyConnected(data=act2, num_hidden=10)
-mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
-
+def get_module_network():
+    data = mx.sym.var('data')
+    data = mx.sym.flatten(data=data)
+    fc1  = mx.sym.FullyConnected(data=data, num_hidden=128)
+    act1 = mx.sym.Activation(data=fc1, act_type="relu")
+    fc2  = mx.sym.FullyConnected(data=act1, num_hidden = 64)
+    act2 = mx.sym.Activation(data=fc2, act_type="relu")
+    fc3  = mx.sym.FullyConnected(data=act2, num_hidden=10)
+    mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
+    return mlp
+
+mlp = get_module_network()
 # Bind model to Module
 mlp_model = mx.mod.Module(symbol=mlp, context=ctx, data_names=['data'], label_names=['softmax_label'])
 ```
@@ -153,14 +159,18 @@ In Gluon, for a sequential model like that, you would create a `Sequential` bloc
 
 
 ```python
-net = nn.HybridSequential()
-with net.name_scope():
-    net.add(
-        nn.Flatten(),
-        nn.Dense(units=128, activation="relu"),
-        nn.Dense(units=64, activation="relu"),
-        nn.Dense(units=10)
-    )
+def get_gluon_network():
+    net = nn.HybridSequential()
+    with net.name_scope():
+        net.add(
+            nn.Flatten(),
+            nn.Dense(units=128, activation="relu"),
+            nn.Dense(units=64, activation="relu"),
+            nn.Dense(units=10)
+        )
+    return net
+
+net = get_gluon_network()
 ```
 
 ## III - Loss
@@ -178,7 +188,7 @@ In Gluon, it is a lot more transparent. Losses, like the [SoftmaxCrossEntropyLos
 
 ```python
 # Softmax with cross entropy loss, directly part of the network
-mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
+out = mx.sym.SoftmaxOutput(data=mlp, name='softmax')
 ```
 
 #### Gluon
@@ -191,8 +201,8 @@ loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
 
 ## IV - Training Loop
 
-The Module API provides a [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) functions that takes care of fitting training data to your symbolic model. With Gluon, you execution flow controls the data flow, so you need to write your own loop. It might seems like it is more verbose, but you have a lot more control as to what is happening during the training. 
-With the [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function, you control the metric reporting, checkpointing, through a lot of different keyword arguments (check the [docs](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit)). That is where you define the optimizer for example.
+The Module API provides a [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) functions that takes care of fitting training data to your symbolic model. With Gluon, your execution flow controls the data flow, so you need to write your own loop. It might seems like it is more verbose, but you have a lot more control as to what is happening during the training. 
+With the [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function, you control the metric reporting, checkpointing or weights initialization through a lot of different keyword arguments (check the [docs](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit)). That is where you define the optimizer for example.
 
 With Gluon, you do these operations directly in the training loop, and the optimizer is part of the [`Trainer`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=trainer#mxnet.gluon.Trainer) object that handles the weight updates of your parameters.
 
@@ -289,9 +299,60 @@ net.export('gluon-model-hybrid', epoch=5)
 # gluon-model-hybrid-symbol.json gluon-model-hybrid-0005.params
 ```
 
+## VI - Loading model for inference
+
+For inference, in the Module API, you need to first load the parameters and symbol, bind the symbol to a module and load the corresponding parameters. You can then pass a batch of data through that module and request the output of the network.
+For the Gluon API, it is a lot simpler, you can just load a serialized model in a [`SymbolBlock`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=symbolblo#mxnet.gluon.SymbolBlock) and run inference directly.
+
+#### Module
+
+```python
+# Load the symbol and parameters
+sym, arg_params, aux_params = mx.model.load_checkpoint('module-model', 5)
+
+# Bind them in a module
+mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
+mod.bind(for_training=False, data_shapes=[('data', (1,1,28,28))], 
+         label_shapes=mod._label_shapes)
+
+# Set the parameters
+mod.set_params(arg_params, aux_params, allow_missing=True)
+
+# Run the inference
+Batch = namedtuple('Batch', ['data'])
+mod.forward(Batch([mx.nd.ones((1,28,28))]))
+prob = mod.get_outputs()[0].asnumpy()
+print("Output probabilities: {}".format(prob))
+```
+
+`Output probabilities: [[0.05537598 0.03889056 0.06126577 0.08879893 0.12371024 0.05759033 0.1378248  0.26134694 0.07905186 0.09614458]]`<!--notebook-skip-line-->
+
+#### Gluon (Symbolic Model)
+
+```python
+net = gluon.SymbolBlock.imports('module-model-symbol.json', ['data', 'softmax_label'], 'module-model-0005.params')
+prob = net(mx.nd.ones((1,1,28,28)), mx.nd.ones(1)) # note the second argument here to account for the softmax_label symbol
+print("Output probabilities: {}".format(prob.asnumpy()))
+```
+
+`Output probabilities: [[0.05537598 0.03889056 0.06126577 0.08879893 0.12371024 0.05759033 0.1378248  0.26134694 0.07905186 0.09614458]]`<!--notebook-skip-line-->
+
+#### Gluon (Imperative Model)
+
+```python
+net = get_gluon_network()
+net.load_parameters('gluon-model.params')
+prob = net(mx.nd.ones((1,1,28,28))).softmax()
+print("Output probabilities: {}".format(prob.asnumpy()))
+```
+
+`Output probabilities: [[0.01298077 0.00173413 0.01661885 0.3362421  0.00536332 0.02099853 0.01413316 0.5528366  0.0133819  0.02571066]]`<!--notebook-skip-line-->
+
 ## Conclusion
 
-This tutorial lead you through the steps necessary to train a deep learning model and showed you the difference between the symbolic approach of the Module API and the imperative one of the Gluon API. If you need more help converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
+This tutorial lead you through the steps necessary to train a deep learning model and showed you the differences between the symbolic approach of the Module API and the imperative one of the Gluon API. If you need more elp converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
+You can also compare the scripts for training MNIST in [Gluon](https://mxnet.incubator.apache.org/tutorials/gluon/mnist.html) and [Module](https://mxnet.incubator.apache.org/tutorials/python/mnist.html).
+
 
 
 <!-- INSERT SOURCE DOWNLOAD BUTTONS -->

From 64c7411b4a921878a6a33343085a5b3881a93a6c Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Fri, 15 Mar 2019 16:35:22 -0700
Subject: [PATCH 07/11] update after review

---
 docs/tutorials/python/module_to_gluon.md | 150 +++++++++++------------
 1 file changed, 69 insertions(+), 81 deletions(-)

diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
index af9dc1bfb355..d6a5b7c88610 100644
--- a/docs/tutorials/python/module_to_gluon.md
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -1,7 +1,7 @@
 
 # Converting Module API code to the Gluon API
 
-Sometimes, you find yourself in the situation where the model you want to use has been written using the symbolic Module API rather than the simpler, easier-to-debug, more flexible, imperative Gluon API. In this tutorial, we will give you a comprehensive guide you can use in order to see how you can transform your Module code, to work with the Gluon API.
+Sometimes you find yourself in the situation where the model you want to use has been written using the symbolic Module API rather than the simpler, easier-to-debug, more flexible, imperative Gluon API. In this tutorial, we will give you a comprehensive guide for transforming Module code to Gluon code.
 
 The different steps to take into consideration are:
 
@@ -21,11 +21,14 @@ In the following section we will look at 1:1 mappings between the Module and the
 
 ## I - Data Loading
 
+In this section we will be looking at the difference in loading data between Module and Gluon.
+Let's first import a few python modules.
 
 ```python
 from collections import namedtuple
 import logging
 logging.basicConfig(level=logging.INFO)
+import random
 
 import numpy as np
 import mxnet as mx
@@ -33,13 +36,22 @@ from mxnet.gluon.data import ArrayDataset, DataLoader
 from mxnet.gluon import nn
 from mxnet import gluon
 
+# parameters
 batch_size = 5
 dataset_length = 50
+
+# random seeds
+random.seed(1)
+np.random.seed(1)
+mx.random.seed(1)
+
 ```
 
 #### Module
 
-When using the Module API we use a [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter), in addition to the data itself, the [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) contains information about the name of the input symbols.
+When using the Module API we use a [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter), in addition to the data itself, the [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) contains information about the name of the input symbols. 
+
+In the Module API, `DataIter`s are responsible for both holding the data and iterating through it. Some `DataIter`s support multi-threading like the [`ImageRecordIter`](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter), while other don't, such as the [`NDArrayIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=ndarrayiter#mxnet.io.NDArrayIter).
 
 Let's create some random data, following the same format as grayscale 28x28 images.
 
@@ -49,6 +61,7 @@ train_data = np.random.rand(dataset_length, 28,28).astype('float32')
 train_label = np.random.randint(0, 10, (dataset_length,)).astype('float32')
 ```
 
+We can now wraps this data into an ArrayIterator that will create batches of data using the first dimension of the provided array as the batch dimension. 
 
 ```python
 data_iter = mx.io.NDArrayIter(data=train_data, label=train_label, batch_size=batch_size, shuffle=False, data_name='data', label_name='softmax_label')
@@ -64,8 +77,10 @@ for batch in data_iter:
 
 #### Gluon
 
-With Gluon, the preferred method is to use a [`DataLoader`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataloader#mxnet.gluon.data.DataLoader) that make use of a [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) to prefetch asynchronously the data.
+With Gluon, the preferred method is to use a [`DataLoader`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataloader#mxnet.gluon.data.DataLoader) that makes use of a [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) to asynchronously prefetch the data. 
 
+The Gluon API offers you the ability to efficiently fetch data and separate the concerns of loading versus holding data. The DataLoader role is to request certain indices of the dataset. The Dataset role is to hold onto data.
+The `Dataset` data can be in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing workers and batch the data together. 
 
 ```python
 dataset = ArrayDataset(train_data, train_label)
@@ -79,64 +94,23 @@ for data, label in dataloader:
     [5. 0. 3. 4. 9.]
     <NDArray 5 @cpu(0)>
 
-
-#### Notable differences
-
-- Gluon keeps a strict separation between data holding, and data loading / fetching. The `Dataset` role is to hold onto some data, in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing workers. This flexible API allows to efficiently pre-fetch data and separate the concerns. 
-- In the module API, `DataIter` are responsible for both holding the data and iterating through it. Some `DataIter` support multi-threading like the [`ImageRecordIter`](https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter), while other don't like the [`NDArrayIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=ndarrayiter#mxnet.io.NDArrayIter).
-
-You can checkout the [`Dataset` and `DataLoader` tutorial](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html). You can either rewrite your code in order to use one of the provided [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) class, like the [`ArrayDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=arraydataset#mxnet.gluon.data.ArrayDataset) or the [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset), or you can simply wrap your existing [`DataIter`](https://mxnet.incubator.apache.org/api/python/io/io.html?highlight=dataiter#mxnet.io.DataIter) to have a similar usage pattern as a `DataLoader`:
-
-
-```python
-class DataIterLoader():
-    def __init__(self, data_iter):
-        self.data_iter = data_iter
-
-    def __iter__(self):
-        self.data_iter.reset()
-        return self
-
-    def __next__(self):
-        batch = self.data_iter.__next__()
-        assert len(batch.data) == len(batch.label) == 1
-        data = batch.data[0]
-        label = batch.label[0]
-        return data, label
-
-    def next(self):
-        return self.__next__() # for Python 2
-```
-
-
-```python
-data_iter = mx.io.NDArrayIter(data=train_data, label=train_label, batch_size=batch_size)
-data_iter_loader = DataIterLoader(data_iter)
-for data, label in data_iter_loader:
-    print(data.shape, label)
-    break
-```
-
-    (5, 28, 28) 
-    [5. 0. 3. 4. 9.]
-    <NDArray 5 @cpu(0)>
+You can check the [`Dataset` and `DataLoader` tutorials](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html) out. You can either rewrite your code in order to use one of the provided [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) class, like the [`ArrayDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=arraydataset#mxnet.gluon.data.ArrayDataset) or the [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset)
 
 
-## II - Model definition
+## II - Model Definition
 
 Let's look at the model definition from the [MNIST Module Tutorial](https://mxnet.incubator.apache.org/tutorials/python/mnist.html):
 
-
-```python
-ctx = mx.cpu()
-```
-
 #### Module
 
 For the Module API, you define the data flow by setting `data` keyword argument of one layer to the next.
 You then bind the symbolic model to a specific compute context and specify the symbol names for the data and the label.
 
 ```python
+
+# context
+ctx = mx.cpu()
+
 def get_module_network():
     data = mx.sym.var('data')
     data = mx.sym.flatten(data=data)
@@ -155,8 +129,8 @@ mlp_model = mx.mod.Module(symbol=mlp, context=ctx, data_names=['data'], label_na
 
 #### Gluon
 
-In Gluon, for a sequential model like that, you would create a `Sequential` block, in that case a `HybridSequential` block to allow for future hybridization since we are only using hybridizable blocks. Learn more [about hybridization](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html). The flow of the data will be automatically set from one layer to the next, since they are held in a `Sequential` block.
-
+In Gluon, for the equivalent model, you would create a `Sequential` block, in that case a `HybridSequential` block to allow for future hybridization since we are only using [hybridizable blocks](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html). The flow of the data will be automatically set from one layer to the next, since they are held in a `Sequential` block.
+Note that we don't need named symbols for the input, and we show how the loss is handled in Gluon in the next section.
 
 ```python
 def get_gluon_network():
@@ -177,14 +151,13 @@ net = get_gluon_network()
 
 The loss, that you are trying to minimize using an optimization algorithm like SGD, is defined differently in the Module API than in Gluon.
 
-In the module API, the loss is part of the network. It has usually a forward pass result, that is the inference value, and a backward pass that is the gradient of the output with respect to that particular loss.
 
-For example the [sym.SoftmaxOutput](https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=softmaxout#mxnet.symbol.SoftmaxOutput) is a softmax output in the forward pass and the gradient with respect to the cross-entropy loss in the backward pass.
+#### Module
 
-In Gluon, it is a lot more transparent. Losses, like the [SoftmaxCrossEntropyLoss](https://mxnet.incubator.apache.org/api/python/gluon/loss.html?highlight=softmaxcross#mxnet.gluon.loss.SoftmaxCrossEntropyLoss), are only computing the actual value of the loss. You then call `.backward()` on the loss value to compute the gradient of the parameters with respect to that loss. At inference time, you simply call `.softmax()` on your output to get the output of your network normalized according to the softmax function.
 
-#### Module
+In the module API, the loss is part of the network. It has usually a forward pass result, that is the inference value, and a backward pass that is the gradient of the output with respect to that particular loss.
 
+For example, the [sym.SoftmaxOutput](https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=softmaxout#mxnet.symbol.SoftmaxOutput) is a softmax output in the forward pass and the gradient with respect to the cross-entropy loss in the backward pass.
 
 ```python
 # Softmax with cross entropy loss, directly part of the network
@@ -194,30 +167,33 @@ out = mx.sym.SoftmaxOutput(data=mlp, name='softmax')
 #### Gluon
 
 
+In Gluon, it is a lot more transparent. Losses, like the [SoftmaxCrossEntropyLoss](https://mxnet.incubator.apache.org/api/python/gluon/loss.html?highlight=softmaxcross#mxnet.gluon.loss.SoftmaxCrossEntropyLoss), are only computing the actual value of the loss. You then call `.backward()` on the loss value to compute the gradient of the parameters with respect to that loss. At inference time, you simply call `.softmax()` on your output to get the output of your network normalized according to the softmax function.
+
+
 ```python
 # We simply create a loss function we will use in our training loop
 loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
 ```
 
-## IV - Training Loop
+In the next section we will show how you use this loss function in Gluon to generate the loss value in the main training loop.
 
-The Module API provides a [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) functions that takes care of fitting training data to your symbolic model. With Gluon, your execution flow controls the data flow, so you need to write your own loop. It might seems like it is more verbose, but you have a lot more control as to what is happening during the training. 
-With the [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function, you control the metric reporting, checkpointing or weights initialization through a lot of different keyword arguments (check the [docs](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit)). That is where you define the optimizer for example.
+## IV - Training Loop
 
-With Gluon, you do these operations directly in the training loop, and the optimizer is part of the [`Trainer`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=trainer#mxnet.gluon.Trainer) object that handles the weight updates of your parameters.
 
 #### Module
 
+The Module API provides a [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function that takes care of fitting training data to your symbolic model. With Gluon, your execution flow controls the data flow, so you need to write your own loop. It might seems like it is more verbose, but you have a lot more control as to what is happening during the training. 
+With the [`.fit()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit) function, you control the metric reporting, checkpointing or weights initialization through a lot of different keyword arguments (check the [docs](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=.fit#mxnet.module.BaseModule.fit)). That is where you define the optimizer for example.
 
 ```python
 mlp_model.fit(data_iter,  # train data
               eval_data=data_iter,  # validation data
-              optimizer='adam',  # use SGD to train
+              optimizer='sgd',  # use SGD to train
               force_init=True,
               force_rebind=True,
               optimizer_params={'learning_rate':0.1},  # use fixed learning rate
               eval_metric='acc',  # report accuracy during training
-              num_epoch=5)  # train for at most 10 dataset passes
+              num_epoch=5)  # train for 5 full dataset passes
 ```
 
 ```INFO:root:Epoch[4] Train-accuracy=0.070000```<!--notebook-skip-line-->
@@ -229,13 +205,15 @@ mlp_model.fit(data_iter,  # train data
 #### Gluon
 
 
+With Gluon, you do these operations directly in the training loop, and the optimizer is part of the [`Trainer`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=trainer#mxnet.gluon.Trainer) object that handles the weight updates of your parameters.
+
+Notice the `loss.backward()` we call before updating the weight as mentionned in the previous section
+
 ```python
-# Initialize network and trainer
-net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
+net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx) # Initialize network and trainer
 trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
 
-# Pick a metric
-metric = mx.metric.Accuracy()
+metric = mx.metric.Accuracy() # Pick a metric
 
 for e in range(5): # start of epoch
     
@@ -246,36 +224,35 @@ for e in range(5): # start of epoch
         with mx.autograd.record():
             output = net(data) # forward pass
             loss = loss_fn(output, label) # get loss
-            loss.backward() # compute gradients
-        
+            
+        loss.backward() # compute gradients
         trainer.step(data.shape[0]) # update weights with SGD
-        metric.update(label, output) # update the metrics
-        # end of mini-batch
+        metric.update(label, output) # update the metrics # end of mini-batch
+
     name, acc = metric.get()
     print('training metrics at epoch %d: %s=%f'%(e, name, acc))
-    metric.reset()
-    # end of epoch
+    metric.reset() # end of epoch
 ```
 
 ```training metrics at epoch 3: accuracy=0.155000```<!--notebook-skip-line-->
 
 ```training metrics at epoch 4: accuracy=0.145000```<!--notebook-skip-line-->
 
+The Gluon training code is more verbose than the simple `.fit` from Module. However that is also the main advantage, there is no black magic going on here, you have full control of your training loop. You can for example easily set breakpoints, modify a learning rate or print data during the training flow. This flexibility also makes easy to implement more complex use-case like gradient accumulation across batches.
 
-## V - Exporting model
+## V - Exporting Model
 
 The ultimate purpose of training a model is to be able to export it and share it, whether it is for deployment or simply reproducibility purposes.
 
-With the Module API, you can save model using the [`.save_checkpoint()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=save_chec#mxnet.module.Module.save_checkpoint) and get a `-symbol.json` and a `.params` file that represent your network. 
+#### Module
 
-With Gluon, network parameters are associated with a `Block`, but the execution flow is controlled in python through the code in `.forward()` function. Hence only [hybridized networks]() can be exported with a `-symbol.json` and `.params` file using [`.export()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=export#mxnet.gluon.HybridBlock.export), non-hybridized models can only have their parameters exported using [`.save_parameters()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=save_pa#mxnet.gluon.Block.save_parameters). Check this great tutorial to learn more: [Saving and Loading Gluon Models](https://mxnet.incubator.apache.org/tutorials/gluon/save_load_params.html).
 
-#### Module
+With the Module API, you can save model using the [`.save_checkpoint()`](https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=save_chec#mxnet.module.Module.save_checkpoint) and get a `-symbol.json` and a `.params` file that represent your network. 
 
 
 ```python
 mlp_model.save_checkpoint('module-model', epoch=5)
-# nodule-model-0005.params module-model-symbol.json
+# module-model-0005.params module-model-symbol.json
 ```
 
 ```INFO:root:Saved checkpoint to "module-model-0005.params"```<!--notebook-skip-line-->
@@ -283,12 +260,19 @@ mlp_model.save_checkpoint('module-model', epoch=5)
 #### Gluon
 
 
+
+With Gluon, network parameters are associated with a `Block`, but the execution flow is controlled in python through the code in `.forward()` function. Hence only [hybridized networks]() can be exported with a `-symbol.json` and `.params` file using [`.export()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=export#mxnet.gluon.HybridBlock.export), non-hybridized models can only have their parameters exported using [`.save_parameters()`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=save_pa#mxnet.gluon.Block.save_parameters). Check this great tutorial to learn more: [Saving and Loading Gluon Models](https://mxnet.incubator.apache.org/tutorials/gluon/save_load_params.html).
+
+
+Any models:
+
 ```python
 # save only the parameters
 net.save_parameters('gluon-model.params')
 # gluon-model.params
 ```
 
+Hybridized models:
 
 ```python
 # save the parameters and the symbolic representation
@@ -299,13 +283,15 @@ net.export('gluon-model-hybrid', epoch=5)
 # gluon-model-hybrid-symbol.json gluon-model-hybrid-0005.params
 ```
 
-## VI - Loading model for inference
+## VI - Loading Model for Inference
 
-For inference, in the Module API, you need to first load the parameters and symbol, bind the symbol to a module and load the corresponding parameters. You can then pass a batch of data through that module and request the output of the network.
-For the Gluon API, it is a lot simpler, you can just load a serialized model in a [`SymbolBlock`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=symbolblo#mxnet.gluon.SymbolBlock) and run inference directly.
 
 #### Module
 
+
+For inference, in the Module API, you need to first load the parameters and symbol, bind the symbol to a module and load the corresponding parameters. You can then pass a batch of data through that module and request the output of the network.
+
+
 ```python
 # Load the symbol and parameters
 sym, arg_params, aux_params = mx.model.load_checkpoint('module-model', 5)
@@ -329,6 +315,8 @@ print("Output probabilities: {}".format(prob))
 
 #### Gluon (Symbolic Model)
 
+For the Gluon API, it is a lot simpler. You can just load a serialized model in a [`SymbolBlock`](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=symbolblo#mxnet.gluon.SymbolBlock) and run inference directly.
+
 ```python
 net = gluon.SymbolBlock.imports('module-model-symbol.json', ['data', 'softmax_label'], 'module-model-0005.params')
 prob = net(mx.nd.ones((1,1,28,28)), mx.nd.ones(1)) # note the second argument here to account for the softmax_label symbol
@@ -350,7 +338,7 @@ print("Output probabilities: {}".format(prob.asnumpy()))
 
 ## Conclusion
 
-This tutorial lead you through the steps necessary to train a deep learning model and showed you the differences between the symbolic approach of the Module API and the imperative one of the Gluon API. If you need more elp converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
+This tutorial lead you through the steps necessary to train a deep learning model and showed you the differences between the symbolic approach of the Module API and the imperative one of the Gluon API. If you need more help converting your Module API code to the Gluon API, reach out to the community on the [discuss forum](https://discuss.mxnet.io)!
 You can also compare the scripts for training MNIST in [Gluon](https://mxnet.incubator.apache.org/tutorials/gluon/mnist.html) and [Module](https://mxnet.incubator.apache.org/tutorials/python/mnist.html).
 
 

From 965e3b4eecabd8784939101c076bc485fa53a3c1 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Fri, 15 Mar 2019 16:35:32 -0700
Subject: [PATCH 08/11] update after review

---
 docs/tutorials/index.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index f8b892fac0ca..f49b9cd9f70b 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -89,7 +89,7 @@ Select API:&nbsp;
     * [Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules.html)
     * [Advanced Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules_advanced.html)
     * [Profiling MXNet Models](/tutorials/python/profiler.html)
-    * [Module to Gluon API](/tutorials/python/module_to_gluon.html)<span style="color:red"> (new!)<span></span></span>
+    * [Module to Gluon API](/tutorials/python/module_to_gluon.html)<span style="color:red"> (new!)</span>
     * [Hybridize Gluon models with control flows](/tutorials/control_flow/ControlFlowTutorial.html)
     * [Gluon end to end from training to inference](/tutorials/gluon/gluon_from_experiment_to_deployment.html)
 
@@ -115,7 +115,7 @@ Select API:&nbsp;
             * [HybridBlocks](/tutorials/gluon/hybrid.html) ([Alternative](http://gluon.mxnet.io/chapter07_distributed-learning/hybridize.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>)
             * [Block Naming](/tutorials/gluon/naming.html)
             * [Custom Operators](/tutorials/gluon/customop.html)
-            * [Control Flow operators](/tutorials/control_flow/ControlFlowTutorial.html)<span style="color:red"> (new!)<span></span></span>
+            * [Control Flow operators](/tutorials/control_flow/ControlFlowTutorial.html)<span style="color:red"> (new!)</span>
         * Autograd
             * [AutoGrad API](/tutorials/gluon/autograd.html)
             * [AutoGrad API with chain rule](http://gluon.mxnet.io/chapter01_crashcourse/autograd.html) <img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/External_link_font_awesome.svg" alt="External link" height="15px" style="margin: 0px 0px 3px 3px;"/>

From 202f191613a9f2bda0b393c6b703a34379097d7b Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Sat, 16 Mar 2019 17:41:06 -0700
Subject: [PATCH 09/11] adding license

---
 docs/tutorials/python/module_to_gluon.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
index d6a5b7c88610..3fb2440542cb 100644
--- a/docs/tutorials/python/module_to_gluon.md
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -1,3 +1,19 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
 
 # Converting Module API code to the Gluon API
 

From d16cd1fe6b888e185efd31f188e022761f2fd022 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Wed, 20 Mar 2019 09:43:39 -0700
Subject: [PATCH 10/11] Update module_to_gluon.md

---
 docs/tutorials/python/module_to_gluon.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/tutorials/python/module_to_gluon.md b/docs/tutorials/python/module_to_gluon.md
index 3fb2440542cb..5ab9d88cbd23 100644
--- a/docs/tutorials/python/module_to_gluon.md
+++ b/docs/tutorials/python/module_to_gluon.md
@@ -33,12 +33,12 @@ V) Exporting Models
 
 VI) Loading Models for Inference
 
-In the following section we will look at 1:1 mappings between the Module and the Gluon ways of training a neural networks.
+In the following section we will look at 1:1 mappings between the Module and the Gluon ways of training a neural network.
 
 ## I - Data Loading
 
 In this section we will be looking at the difference in loading data between Module and Gluon.
-Let's first import a few python modules.
+Let's first import a few Python modules.
 
 ```python
 from collections import namedtuple
@@ -95,8 +95,8 @@ for batch in data_iter:
 
 With Gluon, the preferred method is to use a [`DataLoader`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataloader#mxnet.gluon.data.DataLoader) that makes use of a [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) to asynchronously prefetch the data. 
 
-The Gluon API offers you the ability to efficiently fetch data and separate the concerns of loading versus holding data. The DataLoader role is to request certain indices of the dataset. The Dataset role is to hold onto data.
-The `Dataset` data can be in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing workers and batch the data together. 
+The Gluon API offers you the ability to efficiently fetch data and separate the concerns of loading versus holding data. The DataLoader role is to request certain indices of the dataset. The Dataset has ownership of the data.
+The `Dataset` data can be in or out of memory, and the `DataLoader` role is to request certain indices of the dataset, in the main thread or through multi-processing (or multi-threaded) workers and batch the data together. 
 
 ```python
 dataset = ArrayDataset(train_data, train_label)

From 456266e000cc36a9ccca94b5fc01e003fdfd9a38 Mon Sep 17 00:00:00 2001
From: Thomas Delteil <thomas.delteil1@gmail.com>
Date: Wed, 20 Mar 2019 16:02:01 -0700
Subject: [PATCH 11/11] trigger

---
 docs/tutorials/index.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index f49b9cd9f70b..ec3788c957f9 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -90,7 +90,6 @@ Select API:&nbsp;
     * [Advanced Learning Rate Schedules](/tutorials/gluon/learning_rate_schedules_advanced.html)
     * [Profiling MXNet Models](/tutorials/python/profiler.html)
     * [Module to Gluon API](/tutorials/python/module_to_gluon.html)<span style="color:red"> (new!)</span>
-    * [Hybridize Gluon models with control flows](/tutorials/control_flow/ControlFlowTutorial.html)
     * [Gluon end to end from training to inference](/tutorials/gluon/gluon_from_experiment_to_deployment.html)
 
 * API Guides