@@ -240,7 +240,7 @@
To build the documents locally, we need to first install docker.
Then use the following commands to clone and
build the documents.
-
git clone --recursive https://github.com/apache/incubator-mxnet.git --branch 0.11.0.rc1
+git clone --recursive https://github.com/apache/incubator-mxnet.git --branch 0.11.0.rc3
cd mxnet && make docs
diff --git a/versions/master/_sources/api/python/autograd.txt b/_sources/api/python/autograd.md.txt
similarity index 100%
rename from versions/master/_sources/api/python/autograd.txt
rename to _sources/api/python/autograd.md.txt
diff --git a/versions/master/_sources/api/python/gluon.txt b/_sources/api/python/gluon.md.txt
similarity index 100%
rename from versions/master/_sources/api/python/gluon.txt
rename to _sources/api/python/gluon.md.txt
diff --git a/versions/master/_sources/api/python/image.txt b/_sources/api/python/image.md.txt
similarity index 100%
rename from versions/master/_sources/api/python/image.txt
rename to _sources/api/python/image.md.txt
diff --git a/_sources/api/python/index.md.txt b/_sources/api/python/index.md.txt
index 6051c0e858c3..964ccde0145a 100644
--- a/_sources/api/python/index.md.txt
+++ b/_sources/api/python/index.md.txt
@@ -28,9 +28,12 @@ imported by running:
ndarray
symbol
module
+ autograd
+ gluon
rnn
kvstore
io
+ image
optimization
callback
metric
diff --git a/_sources/api/python/io.md.txt b/_sources/api/python/io.md.txt
index 9cbffc91aa63..15f8aa3ce354 100644
--- a/_sources/api/python/io.md.txt
+++ b/_sources/api/python/io.md.txt
@@ -62,6 +62,7 @@ A detailed tutorial is available at
recordio.MXRecordIO
recordio.MXIndexedRecordIO
image.ImageIter
+ image.ImageDetIter
```
## Helper classes and functions
@@ -81,33 +82,6 @@ Data structures and other iterators provided in the ``mxnet.io`` packages.
io.MXDataIter
```
-A list of image modification functions provided by ``mxnet.image``.
-
-```eval_rst
-.. autosummary::
- :nosignatures:
-
- image.imdecode
- image.scale_down
- image.resize_short
- image.fixed_crop
- image.random_crop
- image.center_crop
- image.color_normalize
- image.random_size_crop
- image.ResizeAug
- image.RandomCropAug
- image.RandomSizedCropAug
- image.CenterCropAug
- image.RandomOrderAug
- image.ColorJitterAug
- image.LightingAug
- image.ColorNormalizeAug
- image.HorizontalFlipAug
- image.CastAug
- image.CreateAugmenter
-```
-
Functions to read and write RecordIO files.
```eval_rst
@@ -179,8 +153,6 @@ The backend engine will recognize the index of `N` in the `layout` as the axis f
```eval_rst
.. automodule:: mxnet.io
:members:
-.. automodule:: mxnet.image
- :members:
.. automodule:: mxnet.recordio
:members:
```
diff --git a/_sources/api/python/ndarray.md.txt b/_sources/api/python/ndarray.md.txt
index a782b910e656..5e9f7e1a1184 100644
--- a/_sources/api/python/ndarray.md.txt
+++ b/_sources/api/python/ndarray.md.txt
@@ -463,6 +463,37 @@ In the rest of this document, we first overview the methods provided by the
Custom
```
+## Contrib
+
+```eval_rst
+.. warning:: This package contains experimental APIs and may change in the near future.
+```
+
+The `contrib.ndarray` module contains many useful experimental APIs for new features. This is a place for the community to try out the new features, so that feature contributors can receive feedback.
+
+```eval_rst
+.. currentmodule:: mxnet.contrib.ndarray
+
+.. autosummary::
+ :nosignatures:
+
+ CTCLoss
+ DeformableConvolution
+ DeformablePSROIPooling
+ MultiBoxDetection
+ MultiBoxPrior
+ MultiBoxTarget
+ MultiProposal
+ PSROIPooling
+ Proposal
+ count_sketch
+ ctc_loss
+ dequantize
+ fft
+ ifft
+ quantize
+```
+
## API Reference
@@ -474,6 +505,9 @@ In the rest of this document, we first overview the methods provided by the
.. automodule:: mxnet.random
:members:
+.. automodule:: mxnet.contrib.ndarray
+ :members:
+
```
diff --git a/_sources/api/python/symbol.md.txt b/_sources/api/python/symbol.md.txt
index f99bee2bd79b..dd455eee587a 100644
--- a/_sources/api/python/symbol.md.txt
+++ b/_sources/api/python/symbol.md.txt
@@ -253,6 +253,7 @@ Composite multiple symbols into a new one by an operator.
broadcast_div
broadcast_mod
negative
+ reciprocal
dot
batch_dot
add_n
@@ -479,6 +480,37 @@ Composite multiple symbols into a new one by an operator.
Custom
```
+## Contrib
+
+```eval_rst
+.. warning:: This package contains experimental APIs and may change in the near future.
+```
+
+The `contrib.symbol` module contains many useful experimental APIs for new features. This is a place for the community to try out the new features, so that feature contributors can receive feedback.
+
+```eval_rst
+.. currentmodule:: mxnet.contrib.symbol
+
+.. autosummary::
+ :nosignatures:
+
+ CTCLoss
+ DeformableConvolution
+ DeformablePSROIPooling
+ MultiBoxDetection
+ MultiBoxPrior
+ MultiBoxTarget
+ MultiProposal
+ PSROIPooling
+ Proposal
+ count_sketch
+ ctc_loss
+ dequantize
+ fft
+ ifft
+ quantize
+```
+
## API Reference
@@ -487,6 +519,9 @@ Composite multiple symbols into a new one by an operator.
.. automodule:: mxnet.symbol
:members:
+.. automodule:: mxnet.contrib.symbol
+ :members:
+
```
diff --git a/_sources/architecture/overview.md.txt b/_sources/architecture/overview.md.txt
index 361e0c91de63..a7632d4a61e8 100644
--- a/_sources/architecture/overview.md.txt
+++ b/_sources/architecture/overview.md.txt
@@ -48,7 +48,7 @@ The following API is the core interface for the execution engine:
This API allows you to push a function (`exec_fun`),
along with its context information and dependencies, to the engine.
`exec_ctx` is the context information in which the `exec_fun` should be executed,
-`const_vars` denotes the variables that the function reads from,
+`const_vars` denotes the variables that the function reads from,
and `mutate_vars` are the variables to be modified.
The engine provides the following guarantee:
diff --git a/_sources/architecture/program_model.md.txt b/_sources/architecture/program_model.md.txt
index 380990e7019f..519a9a9024d8 100644
--- a/_sources/architecture/program_model.md.txt
+++ b/_sources/architecture/program_model.md.txt
@@ -92,7 +92,7 @@ are powerful DSLs that generate callable computation graphs for neural networks.
Intuitively, you might say that imperative programs
-are more *native* than symbolic programs.
+are more *native* than symbolic programs.
It's easier to use native language features.
For example, it's straightforward to print out the values
in the middle of computation or to use native control flow and loops
@@ -269,7 +269,7 @@ Recall the *be prepared to encounter all possible demands* requirement of impera
If you are creating an array library that supports automatic differentiation,
you have to keep the grad closure along with the computation.
This means that none of the history variables can be
-garbage-collected because they are referenced by variable `d` by way of function closure.
+garbage-collected because they are referenced by variable `d` by way of function closure.
What if you want to compute only the value of `d`,
and don't want the gradient value?
@@ -305,7 +305,6 @@ For example, one solution to the preceding
problem is to introduce a context variable.
You can introduce a no-gradient context variable
to turn gradient calculation off.
-
```python
with context.NoGradient():
@@ -315,6 +314,8 @@ to turn gradient calculation off.
d = c + 1
```
+
+
However, this example still must be prepared to encounter all possible demands,
which means that you can't perform the in-place calculation
to reuse memory in the forward pass (a trick commonly used to reduce GPU memory usage).
@@ -380,7 +381,7 @@ It's usually easier to write parameter updates in an imperative style,
especially when you need multiple updates that relate to each other.
For symbolic programs, the update statement is also executed as you call it.
So in that sense, most symbolic deep learning libraries
-fall back on the imperative approach to perform updates,
+fall back on the imperative approach to perform updates,
while using the symbolic approach to perform gradient calculation.
### There Is No Strict Boundary
@@ -388,7 +389,7 @@ while using the symbolic approach to perform gradient calculation.
In comparing the two programming styles,
some of our arguments might not be strictly true,
i.e., it's possible to make an imperative program
-more like a traditional symbolic program or vice versa.
+more like a traditional symbolic program or vice versa.
However, the two archetypes are useful abstractions,
especially for understanding the differences between deep learning libraries.
We might reasonably conclude that there is no clear boundary between programming styles.
@@ -400,7 +401,7 @@ information held in symbolic programs.
## Big vs. Small Operations
-When designing a deep learning library, another important programming model decision
+When designing a deep learning library, another important programming model decision
is precisely what operations to support.
In general, there are two families of operations supported by most deep learning libraries:
@@ -418,7 +419,7 @@ For example, the sigmoid unit can simply be composed of division, addition and a
sigmoid(x) = 1.0 / (1.0 + exp(-x))
```
Using smaller operations as building blocks, you can express nearly anything you want.
-If you're more familiar with CXXNet- or Caffe-style layers,
+If you're more familiar with CXXNet- or Caffe-style layers,
note that these operations don't differ from a layer, except that they are smaller.
```python
@@ -433,7 +434,7 @@ because you only need to compose the components.
Directly composing sigmoid layers requires three layers of operation, instead of one.
```python
- SigmoidLayer(x) = EWiseDivisionLayer(1.0, AddScalarLayer(ExpLayer(-x), 1.0))
+ SigmoidLayer(x) = EWiseDivisionLayer(1.0, AddScalarLayer(ExpLayer(-x), 1.0))
```
This code creates overhead for computation and memory (which could be optimized, with cost).
@@ -467,7 +468,7 @@ these optimizations are crucial to performance.
Because the operations are small,
there are many sub-graph patterns that can be matched.
Also, because the final, generated operations
-might not enumerable,
+might not be enumerable,
an explicit recompilation of the kernels is required,
as opposed to the fixed amount of precompiled kernels
in the big operation libraries.
@@ -476,7 +477,7 @@ that support small operations.
Requiring compilation optimization also creates engineering overhead
for the libraries that solely support smaller operations.
-As in the case of symbolic vs imperative,
+As in the case of symbolic vs. imperative,
the bigger operation libraries "cheat"
by asking you to provide restrictions (to the common layer),
so that you actually perform the sub-graph matching.
@@ -522,7 +523,7 @@ The more suitable programming style depends on the problem you are trying to sol
For example, imperative programs are better for parameter updates,
and symbolic programs for gradient calculation.
-We advocate *mixing* the approaches.
+We advocate *mixing* the approaches.
Sometimes the part that we want to be flexible
isn't crucial to performance.
In these cases, it's okay to leave some efficiency on the table
@@ -562,7 +563,7 @@ This is exactly like writing C++ programs and exposing them to Python, which we
Because parameter memory resides on the GPU,
you might not want to use NumPy as an imperative component.
Supporting a GPU-compatible imperative library
-that interacts with symbolic compiled functions
+that interacts with symbolic compiled functions
or provides a limited amount of updating syntax
in the update statement in symbolic program execution
might be a better choice.
diff --git a/_sources/get_started/install.md.txt b/_sources/get_started/install.md.txt
index 898aa0899a1d..0e88a0d2a2ee 100644
--- a/_sources/get_started/install.md.txt
+++ b/_sources/get_started/install.md.txt
@@ -235,10 +235,10 @@ $ make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas
**Build the MXNet Python binding**
-**Step 1** Install prerequisites - python setup tools and numpy.
+**Step 1** Install prerequisites - python, setup-tools, python-pip and numpy.
```bash
-$ sudo apt-get install -y python-dev python-setuptools python-numpy
+$ sudo apt-get install -y python-dev python-setuptools python-numpy python-pip
```
**Step 2** Install the MXNet Python binding.
@@ -458,10 +458,10 @@ $ make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/
**Install the MXNet Python binding**
-**Step 1** Install prerequisites - python setup tools and numpy.
+**Step 1** Install prerequisites - python, setup-tools, python-pip and numpy.
```bash
-$ sudo apt-get install -y python-dev python-setuptools python-numpy
+$ sudo apt-get install -y python-dev python-setuptools python-numpy python-pip
```
**Step 2** Install the MXNet Python binding.
@@ -1462,3 +1462,5 @@ Will be available soon.
+
+# Download Source Package
\ No newline at end of file
diff --git a/_sources/get_started/windows_setup.md.txt b/_sources/get_started/windows_setup.md.txt
index 86104c6be5f3..f9067732d11a 100644
--- a/_sources/get_started/windows_setup.md.txt
+++ b/_sources/get_started/windows_setup.md.txt
@@ -9,7 +9,6 @@ You can either use a prebuilt binary package or build from source to build the M
MXNet provides a prebuilt package for Windows. The prebuilt package includes the MXNet library, all of the dependent third-party libraries, a sample C++ solution for Visual Studio, and the Python installation script. To install the prebuilt package:
1. Download the latest prebuilt package from the [Releases](https://github.com/dmlc/mxnet/releases) tab of MXNet.
- There are two versions. One with GPU support (using CUDA and CUDNN v3), and one without GPU support. Choose the version that suits your hardware configuration. For more information on which version works on each hardware configuration, see [Requirements for GPU](http://mxnet.io/get_started/setup.html#requirements-for-using-gpus).
2. Unpack the package into a folder, with an appropriate name, such as ```D:\MXNet```.
3. Open the folder, and install the package by double-clicking ```setupenv.cmd```. This sets up all of the environment variables required by MXNet.
4. Test the installation by opening the provided sample C++ Visual Studio solution and building it.
@@ -23,7 +22,7 @@ This produces a library called ```libmxnet.dll```.
To build and install MXNet yourself, you need the following dependencies. Install the required dependencies:
1. If [Microsoft Visual Studio 2013](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition.
-2. Install [Visual C++ Compiler Nov 2013 CTP](https://www.microsoft.com/en-us/download/details.aspx?id=41151).
+2. Install [Visual C++ Compiler](http://landinghub.visualstudio.com/visual-cpp-build-tools).
3. Back up all of the files in the ```C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC``` folder to a different location.
4. Copy all of the files in the ```C:\Program Files (x86)\Microsoft Visual C++ Compiler Nov 2013 CTP``` folder (or the folder where you extracted the zip archive) to the ```C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC``` folder, and overwrite all existing files.
5. Download and install [OpenCV](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download).
diff --git a/_sources/how_to/cloud.md.txt b/_sources/how_to/cloud.md.txt
index 47ea40cf4595..67b28f8b4338 100644
--- a/_sources/how_to/cloud.md.txt
+++ b/_sources/how_to/cloud.md.txt
@@ -1,183 +1,183 @@
-# MXNet on the Cloud
-
-Deep learning can require extremely powerful hardware, often for unpredictable durations of time.
-Moreover, _MXNet_ can benefit from both multiple GPUs and multiple machines.
-Accordingly, cloud computing, as offered by AWS and others,
-is especially well suited to training deep learning models.
-Using AWS, we can rapidly fire up multiple machines with multiple GPUs each at will
-and maintain the resources for precisely the amount of time needed.
-
-## Set Up an AWS GPU Cluster from Scratch
-
-In this document, we provide a step-by-step guide that will teach you
-how to set up an AWS cluster with _MXNet_. We show how to:
-
-- [Use Amazon S3 to host data](#use-amazon-s3-to-host-data)
-- [Set up an EC2 GPU instance with all dependencies installed](#set-up-an-ec2-gpu-instance)
-- [Build and run MXNet on a single computer](#build-and-run-mxnet-on-a-gpu-instance)
-- [Set up an EC2 GPU cluster for distributed training](#set-up-an-ec2-gpu-cluster-for-distributed-training)
-
-### Use Amazon S3 to Host Data
-
-Amazon S3 provides distributed data storage which proves especially convenient for hosting large datasets.
-To use S3, you need [AWS credentials](http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html),
-including an `ACCESS_KEY_ID` and a `SECRET_ACCESS_KEY`.
-
-To use _MXNet_ with S3, set the environment variables `AWS_ACCESS_KEY_ID` and
-`AWS_SECRET_ACCESS_KEY` by adding the following two lines in
-`~/.bashrc` (replacing the strings with the correct ones):
-
-```bash
-export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
-export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-```
-
-There are several ways to upload data to S3. One simple way is to use
-[s3cmd](http://s3tools.org/s3cmd). For example:
-
-```bash
-wget http://data.mxnet.io/mxnet/data/mnist.zip
-unzip mnist.zip && s3cmd put t*-ubyte s3://dmlc/mnist/
-```
-
-### Use Pre-installed EC2 GPU Instance
-The [Deep Learning AMI](https://aws.amazon.com/marketplace/pp/B01M0AXXQB?qid=1475211685369&sr=0-1&ref_=srh_res_product_title) is an Amazon Linux image
-supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2).
-It contains [MXNet-v0.9.3 tag](https://github.com/dmlc/mxnet) and the necessary components to get going with deep learning,
-including Nvidia drivers, CUDA, cuDNN, Anaconda, Python2 and Python3.
-The AMI IDs are the following:
-
-* us-east-1: ami-e7c96af1
-* us-west-2: ami-dfb13ebf
-* eu-west-1: ami-6e5d6808
-
-Now you can launch _MXNet_ directly on an EC2 GPU instance.
-You can also use [Jupyter](http://jupyter.org) notebook on EC2 machine.
-Here is a [good tutorial](https://github.com/dmlc/mxnet-notebooks)
-on how to connect to a Jupyter notebook running on an EC2 instance.
-
-### Set Up an EC2 GPU Instance from Scratch
-
-_MXNet_ requires the following libraries:
-
-- C++ compiler with C++11 support, such as `gcc >= 4.8`
-- `CUDA` (`CUDNN` in optional) for GPU linear algebra
-- `BLAS` (cblas, open-blas, atblas, mkl, or others) for CPU linear algebra
-- `opencv` for image augmentations
-- `curl` and `openssl` for the ability to read/write to Amazon S3
-
-Installing `CUDA` on EC2 instances requires some effort. Caffe has a good
-[tutorial](https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN-3))
-on how to install CUDA 7.0 on Ubuntu 14.04.
-
-***Note:*** We tried CUDA 7.5 on Nov 7, 2015, but found it problematic.
-
-You can install the rest using the package manager. For example, on Ubuntu:
-
-```
-sudo apt-get update
-sudo apt-get install -y build-essential git libcurl4-openssl-dev libatlas-base-dev libopencv-dev python-numpy
-```
-
-The Amazon Machine Image (AMI) [ami-12fd8178](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:ami=ami-12fd8178) has the packages listed above installed.
-
-
-### Build and Run MXNet on a GPU Instance
-
-The following commands build _MXNet_ with CUDA/CUDNN, Amazon S3, and distributed
-training.
-
-```bash
-git clone --recursive https://github.com/dmlc/mxnet
-cd mxnet; cp make/config.mk .
-echo "USE_CUDA=1" >>config.mk
-echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
-echo "USE_CUDNN=1" >>config.mk
-echo "USE_BLAS=atlas" >> config.mk
-echo "USE_DIST_KVSTORE = 1" >>config.mk
-echo "USE_S3=1" >>config.mk
-make -j$(nproc)
-```
-
-To test whether everything is installed properly, we can try training a convolutional neural network (CNN) on the MNIST dataset using a GPU:
-
-```bash
-python tests/python/gpu/test_conv.py
-```
-
-If you've placed the MNIST data on `s3://dmlc/mnist`, you can read the data stored on Amazon S3 directly with the following command:
-
-```bash
-sed -i.bak "s!data_dir = 'data'!data_dir = 's3://dmlc/mnist'!" tests/python/gpu/test_conv.py
-```
-
-***Note:*** You can use `sudo ln /dev/null /dev/raw1394` to fix the opencv error `libdc1394 error: Failed to initialize libdc1394`.
-
-### Set Up an EC2 GPU Cluster for Distributed Training
-
-A cluster consists of multiple computers.
-You can use one computer with _MXNet_ installed as the root computer for submitting jobs,and then launch several
-slave computers to run the jobs. For example, launch multiple instances using an
-AMI, e.g.,
-[ami-12fd8178](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:ami=ami-12fd8178),
-with dependencies installed. There are two options:
-
-- Make all slaves' ports accessible (same for the root) by setting type: All TCP,
- Source: Anywhere in Configure Security Group.
-
-- Use the same `pem` as the root computer to access all slave computers, and
- then copy the `pem` file into the root computer's `~/.ssh/id_rsa`. If you do this, all slave computers can be accessed with SSH from the root.
-
-Now, run the CNN on multiple computers. Assume that we are on a working
-directory of the root computer, such as `~/train`, and MXNet is built as `~/mxnet`.
-
-1. Pack the _MXNet_ Python library into this working directory for easy
- synchronization:
-
- ```bash
- cp -r ~/mxnet/python/mxnet .
- cp ~/mxnet/lib/libmxnet.so mxnet/
- ```
-
- And then copy the training program:
-
- ```bash
- cp ~/mxnet/example/image-classification/*.py .
- cp -r ~/mxnet/example/image-classification/common .
- ```
-
-2. Prepare a host file with all slaves private IPs. For example, `cat hosts`:
-
- ```bash
- 172.30.0.172
- 172.30.0.171
- ```
-
-3. Assuming that there are two computers, train the CNN using two workers:
-
- ```bash
- ../../tools/launch.py -n 2 -H hosts --sync-dir /tmp/mxnet python train_mnist.py --kv-store dist_sync
- ```
-
-***Note:*** Sometimes the jobs linger at the slave computers even though you've pressed `Ctrl-c`
-at the root node. To terminate them, use the following command:
-
-```bash
-cat hosts | xargs -I{} ssh -o StrictHostKeyChecking=no {} 'uname -a; pgrep python | xargs kill -9'
-```
-
-***Note:*** The preceding example is very simple to train and therefore isn't a good
-benchmark for distributed training. Consider using other [examples](https://github.com/dmlc/mxnet/tree/master/example/image-classification).
-
-### More Options
-#### Use Multiple Data Shards
-It is common to pack a dataset into multiple files, especially when working in a distributed environment.
-_MXNet_ supports direct loading from multiple data shards.
-Put all of the record files into a folder, and point the data path to the folder.
-
-#### Use YARN and SGE
-Although using SSH can be simple when you don't have a cluster scheduling framework,
-_MXNet_ is designed to be portable to various platforms.
-We provide scripts available in [tracker](https://github.com/dmlc/dmlc-core/tree/master/tracker)
-to allow running on other cluster frameworks, including Hadoop (YARN) and SGE.
-We welcome contributions from the community of examples of running _MXNet_ on your favorite distributed platform.
+# MXNet on the Cloud
+
+Deep learning can require extremely powerful hardware, often for unpredictable durations of time.
+Moreover, _MXNet_ can benefit from both multiple GPUs and multiple machines.
+Accordingly, cloud computing, as offered by AWS and others,
+is especially well suited to training deep learning models.
+Using AWS, we can rapidly fire up multiple machines with multiple GPUs each at will
+and maintain the resources for precisely the amount of time needed.
+
+## Set Up an AWS GPU Cluster from Scratch
+
+In this document, we provide a step-by-step guide that will teach you
+how to set up an AWS cluster with _MXNet_. We show how to:
+
+- [Use Amazon S3 to host data](#use-amazon-s3-to-host-data)
+- [Set up an EC2 GPU instance with all dependencies installed](#set-up-an-ec2-gpu-instance)
+- [Build and run MXNet on a single computer](#build-and-run-mxnet-on-a-gpu-instance)
+- [Set up an EC2 GPU cluster for distributed training](#set-up-an-ec2-gpu-cluster-for-distributed-training)
+
+### Use Amazon S3 to Host Data
+
+Amazon S3 provides distributed data storage which proves especially convenient for hosting large datasets.
+To use S3, you need [AWS credentials](http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html),
+including an `ACCESS_KEY_ID` and a `SECRET_ACCESS_KEY`.
+
+To use _MXNet_ with S3, set the environment variables `AWS_ACCESS_KEY_ID` and
+`AWS_SECRET_ACCESS_KEY` by adding the following two lines in
+`~/.bashrc` (replacing the strings with the correct ones):
+
+```bash
+export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
+export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
+```
+
+There are several ways to upload data to S3. One simple way is to use
+[s3cmd](http://s3tools.org/s3cmd). For example:
+
+```bash
+wget http://data.mxnet.io/mxnet/data/mnist.zip
+unzip mnist.zip && s3cmd put t*-ubyte s3://dmlc/mnist/
+```
+
+### Use Pre-installed EC2 GPU Instance
+The [Deep Learning AMI](https://aws.amazon.com/marketplace/pp/B01M0AXXQB?qid=1475211685369&sr=0-1&ref_=srh_res_product_title) is an Amazon Linux image
+supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2).
+It contains [MXNet-v0.9.3 tag](https://github.com/dmlc/mxnet) and the necessary components to get going with deep learning,
+including Nvidia drivers, CUDA, cuDNN, Anaconda, Python2 and Python3.
+The AMI IDs are the following:
+
+* us-east-1: ami-e7c96af1
+* us-west-2: ami-dfb13ebf
+* eu-west-1: ami-6e5d6808
+
+Now you can launch _MXNet_ directly on an EC2 GPU instance.
+You can also use [Jupyter](http://jupyter.org) notebook on EC2 machine.
+Here is a [good tutorial](https://github.com/dmlc/mxnet-notebooks)
+on how to connect to a Jupyter notebook running on an EC2 instance.
+
+### Set Up an EC2 GPU Instance from Scratch
+
+_MXNet_ requires the following libraries:
+
+- C++ compiler with C++11 support, such as `gcc >= 4.8`
+- `CUDA` (`CUDNN` in optional) for GPU linear algebra
+- `BLAS` (cblas, open-blas, atblas, mkl, or others) for CPU linear algebra
+- `opencv` for image augmentations
+- `curl` and `openssl` for the ability to read/write to Amazon S3
+
+Installing `CUDA` on EC2 instances requires some effort. Caffe has a good
+[tutorial](https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN-3))
+on how to install CUDA 7.0 on Ubuntu 14.04.
+
+***Note:*** We tried CUDA 7.5 on Nov 7, 2015, but found it problematic.
+
+You can install the rest using the package manager. For example, on Ubuntu:
+
+```
+sudo apt-get update
+sudo apt-get install -y build-essential git libcurl4-openssl-dev libatlas-base-dev libopencv-dev python-numpy
+```
+
+The Amazon Machine Image (AMI) [ami-12fd8178](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:ami=ami-12fd8178) has the packages listed above installed.
+
+
+### Build and Run MXNet on a GPU Instance
+
+The following commands build _MXNet_ with CUDA/CUDNN, Amazon S3, and distributed
+training.
+
+```bash
+git clone --recursive https://github.com/dmlc/mxnet
+cd mxnet; cp make/config.mk .
+echo "USE_CUDA=1" >>config.mk
+echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
+echo "USE_CUDNN=1" >>config.mk
+echo "USE_BLAS=atlas" >> config.mk
+echo "USE_DIST_KVSTORE = 1" >>config.mk
+echo "USE_S3=1" >>config.mk
+make -j$(nproc)
+```
+
+To test whether everything is installed properly, we can try training a convolutional neural network (CNN) on the MNIST dataset using a GPU:
+
+```bash
+python example/image-classification/train_mnist.py
+```
+
+If you've placed the MNIST data on `s3://dmlc/mnist`, you can read the data stored on Amazon S3 directly with the following command:
+
+```bash
+sed -i.bak "s!data_dir = 'data'!data_dir = 's3://dmlc/mnist'!" example/image-classification/train_mnist.py
+```
+
+***Note:*** You can use `sudo ln /dev/null /dev/raw1394` to fix the opencv error `libdc1394 error: Failed to initialize libdc1394`.
+
+### Set Up an EC2 GPU Cluster for Distributed Training
+
+A cluster consists of multiple computers.
+You can use one computer with _MXNet_ installed as the root computer for submitting jobs,and then launch several
+slave computers to run the jobs. For example, launch multiple instances using an
+AMI, e.g.,
+[ami-12fd8178](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:ami=ami-12fd8178),
+with dependencies installed. There are two options:
+
+- Make all slaves' ports accessible (same for the root) by setting type: All TCP,
+ Source: Anywhere in Configure Security Group.
+
+- Use the same `pem` as the root computer to access all slave computers, and
+ then copy the `pem` file into the root computer's `~/.ssh/id_rsa`. If you do this, all slave computers can be accessed with SSH from the root.
+
+Now, run the CNN on multiple computers. Assume that we are on a working
+directory of the root computer, such as `~/train`, and MXNet is built as `~/mxnet`.
+
+1. Pack the _MXNet_ Python library into this working directory for easy
+ synchronization:
+
+ ```bash
+ cp -r ~/mxnet/python/mxnet .
+ cp ~/mxnet/lib/libmxnet.so mxnet/
+ ```
+
+ And then copy the training program:
+
+ ```bash
+ cp ~/mxnet/example/image-classification/*.py .
+ cp -r ~/mxnet/example/image-classification/common .
+ ```
+
+2. Prepare a host file with all slaves private IPs. For example, `cat hosts`:
+
+ ```bash
+ 172.30.0.172
+ 172.30.0.171
+ ```
+
+3. Assuming that there are two computers, train the CNN using two workers:
+
+ ```bash
+ ../../tools/launch.py -n 2 -H hosts --sync-dir /tmp/mxnet python train_mnist.py --kv-store dist_sync
+ ```
+
+***Note:*** Sometimes the jobs linger at the slave computers even though you've pressed `Ctrl-c`
+at the root node. To terminate them, use the following command:
+
+```bash
+cat hosts | xargs -I{} ssh -o StrictHostKeyChecking=no {} 'uname -a; pgrep python | xargs kill -9'
+```
+
+***Note:*** The preceding example is very simple to train and therefore isn't a good
+benchmark for distributed training. Consider using other [examples](https://github.com/dmlc/mxnet/tree/master/example/image-classification).
+
+### More Options
+#### Use Multiple Data Shards
+It is common to pack a dataset into multiple files, especially when working in a distributed environment.
+_MXNet_ supports direct loading from multiple data shards.
+Put all of the record files into a folder, and point the data path to the folder.
+
+#### Use YARN and SGE
+Although using SSH can be simple when you don't have a cluster scheduling framework,
+_MXNet_ is designed to be portable to various platforms.
+We provide scripts available in [tracker](https://github.com/dmlc/dmlc-core/tree/master/tracker)
+to allow running on other cluster frameworks, including Hadoop (YARN) and SGE.
+We welcome contributions from the community of examples of running _MXNet_ on your favorite distributed platform.
diff --git a/_sources/how_to/finetune.md.txt b/_sources/how_to/finetune.md.txt
index 79d06cb5bb77..f6c164c28db9 100644
--- a/_sources/how_to/finetune.md.txt
+++ b/_sources/how_to/finetune.md.txt
@@ -45,6 +45,8 @@ training set, and the rest for the validation set. We resize images into 256x256
size and pack them into the rec file. The scripts to prepare the data is as
following.
+> In order to successfully run the following bash script on Windows please use https://cygwin.com/install.html .
+
```sh
wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar
tar -xf 256_ObjectCategories.tar
diff --git a/_sources/how_to/index.md.txt b/_sources/how_to/index.md.txt
index 8b29322d7578..4920e1cd3f78 100644
--- a/_sources/how_to/index.md.txt
+++ b/_sources/how_to/index.md.txt
@@ -38,7 +38,7 @@ and full working examples, visit the [tutorials section](../tutorials/index.md).
* [How do I run Keras 1.2.2 with mxnet backend?](https://github.com/dmlc/keras/wiki/Installation)
-* [How to convert MXNet models into Apple CoreML format?](https://github.com/apache/incubator-mxnet/tree/master/tools/coreml)
+* [How to convert MXNet models to Apple CoreML format?](https://github.com/apache/incubator-mxnet/tree/master/tools/coreml)
## Extend and Contribute to MXNet
diff --git a/_sources/model_zoo/index.md.txt b/_sources/model_zoo/index.md.txt
index a5a2b327937a..19811f22552d 100644
--- a/_sources/model_zoo/index.md.txt
+++ b/_sources/model_zoo/index.md.txt
@@ -32,7 +32,7 @@ Convolutional neural networks are the state-of-art architecture for many image a
* [Places2](http://places2.csail.mit.edu/download.html): There are 1.6 million train images from 365 scene categories in the Places365-Standard, which are used to train the Places365 CNNs. There are 50 images per category in the validation set and 900 images per category in the testing set. Compared to the train set of Places365-Standard, the train set of Places365-Challenge has 6.2 million extra images, leading to totally 8 million train images for the Places365 challenge 2016. The validation set and testing set are the same as the Places365-Standard.
* [Multimedia Commons](https://aws.amazon.com/public-datasets/multimedia-commons/): YFCC100M (99.2 million images and 0.8 million videos from Flickr) and supplemental material (pre-extracted features, additional annotations).
-For instructions on using these models, see [the python tutorial on using pre-trained ImageNet models](http://mxnet.io/tutorials/python/predict_imagenet.html).
+For instructions on using these models, see [the python tutorial on using pre-trained ImageNet models](https://mxnet.incubator.apache.org/tutorials/python/predict_image.html).
| Model Definition | Dataset | Model Weights | Research Basis | Contributors |
| --- | --- | --- | --- | --- |
@@ -53,19 +53,19 @@ For instructions on using these models, see [the python tutorial on using pre-tr
## Recurrent Neural Networks (RNNs) including LSTMs
-MXNet supports many types of recurrent neural networks (RNNs), including Long Short-Term Memory ([LSTM](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf))
+MXNet supports many types of recurrent neural networks (RNNs), including Long Short-Term Memory ([LSTM](http://www.bioinf.jku.at/publications/older/2604.pdf))
and Gated Recurrent Units (GRU) networks. Some available datasets include:
-* [Penn Treebank (PTB)](https://www.cis.upenn.edu/~treebank/): Text corpus with ~1 million words. Vocabulary is limited to 10,000 words. The task is predicting downstream words/characters.
+* [Penn Treebank (PTB)](https://catalog.ldc.upenn.edu/LDC95T7): Text corpus with ~1 million words. Vocabulary is limited to 10,000 words. The task is predicting downstream words/characters.
* [Shakespeare](http://cs.stanford.edu/people/karpathy/char-rnn/): Complete text from Shakespeare's works.
-* [IMDB reviews](https://s3.amazonaws.com/text-datasets): 25,000 movie reviews, labeled as positive or negative
+* [IMDB reviews](https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3): 25,000 movie reviews, labeled as positive or negative
* [Facebook bAbI](https://research.facebook.com/researchers/1543934539189348): As a set of 20 question & answer tasks, each with 1,000 training examples.
* [Flickr8k, COCO](http://mscoco.org/): Images with associated caption (sentences). Flickr8k consists of 8,092 images captioned by AmazonTurkers with ~40,000 captions. COCO has 328,000 images, each with 5 captions. The COCO images also come with labeled objects using segmentation algorithms.
| Model Definition | Dataset | Model Weights | Research Basis | Contributors |
| --- | --- | --- | --- | --- |
-| LSTM - Image Captioning | Flickr8k, MS COCO | | [Vinyals et al.., 2015](https://arxiv.org/pdf/ 1411.4555v2.pdf) | @... |
+| LSTM - Image Captioning | Flickr8k, MS COCO | | [Vinyals et al.., 2015](https://arxiv.org/pdf/1411.4555.pdf) | @... |
| LSTM - Q&A System| bAbl | | [Weston et al.., 2015](https://arxiv.org/pdf/1502.05698v10.pdf) | |
| LSTM - Sentiment Analysis| IMDB | | [Li et al.., 2015](http://arxiv.org/pdf/1503.00185v5.pdf) | |
diff --git a/_sources/tutorials/basic/data.md.txt b/_sources/tutorials/basic/data.md.txt
index dba13918aa0e..d4db7d0de1b6 100644
--- a/_sources/tutorials/basic/data.md.txt
+++ b/_sources/tutorials/basic/data.md.txt
@@ -19,7 +19,7 @@ $ pip install opencv-python requests matplotlib jupyter
```
$ git clone https://github.com/dmlc/mxnet ~/mxnet
-$ MXNET_HOME = '~/mxnet'
+$ export MXNET_HOME='~/mxnet'
```
## MXNet Data Iterator
@@ -30,7 +30,7 @@ Iterators provide an abstract interface for traversing various types of iterable
without needing to expose details about the underlying data source.
In MXNet, data iterators return a batch of data as `DataBatch` on each call to `next`.
-A `DataBatch` often contains *n* training examples and their corresponding labels. Here *n* is the `batch_size` of the iterator. At the end of the data stream when there is no more data to read, the iterator raises ``StopIteration`` exception like Python `iter`.
+A `DataBatch` often contains *n* training examples and their corresponding labels. Here *n* is the `batch_size` of the iterator. At the end of the data stream when there is no more data to read, the iterator raises ``StopIteration`` exception like Python `iter`.
The structure of `DataBatch` is defined [here](http://mxnet.io/api/python/io.html#mxnet.io.DataBatch).
Information such as name, shape, type and layout on each training example and their corresponding label can be provided as `DataDesc` data descriptor objects via the `provide_data` and `provide_label` properties in `DataBatch`.
@@ -366,7 +366,7 @@ Now let's convert them into record io format using the `im2rec.py` utility scrip
First, we need to make a list that contains all the image files and their categories:
```python
-os.system('python %s/tools/im2rec.py --list=1 --recursive=1 --shuffle=1 --test-ratio=0.2 data/caltech data/101_ObjectCategories'%MXNET_HOME)
+os.system('python %s/tools/im2rec.py --list=1 --recursive=1 --shuffle=1 --test-ratio=0.2 data/caltech data/101_ObjectCategories'%os.environ['MXNET_HOME'])
```
The resulting list file (./data/caltech_train.lst) is in the format `index\t(one or more label)\tpath`. In this case, there is only one label for each image but you can modify the list to add in more for multi-label training.
@@ -375,7 +375,7 @@ Then we can use this list to create our record io file:
```python
-os.system("python %s/tools/im2rec.py --num-thread=4 --pass-through=1 data/caltech data/101_ObjectCategories"%MXNET_HOME)
+os.system("python %s/tools/im2rec.py --num-thread=4 --pass-through=1 data/caltech data/101_ObjectCategories"%os.environ['MXNET_HOME'])
```
The record io files are now saved at here (./data)
diff --git a/_sources/tutorials/basic/module.md.txt b/_sources/tutorials/basic/module.md.txt
index 15fdaeef68c4..e0618ca65e4a 100644
--- a/_sources/tutorials/basic/module.md.txt
+++ b/_sources/tutorials/basic/module.md.txt
@@ -173,8 +173,8 @@ dataset and evaluates the performance according to the given input metric.
It can be used as follows:
```python
-score = mod.score(val_iter, ['mse', 'acc'])
-print("Accuracy score is %f" % (score))
+score = mod.score(val_iter, ['acc'])
+print("Accuracy score is %f" % (score[0][1]))
```
Some of the other metrics which can be used are `top_k_acc`(top-k-accuracy),
diff --git a/versions/master/_sources/tutorials/gluon/autograd.txt b/_sources/tutorials/gluon/autograd.md.txt
similarity index 100%
rename from versions/master/_sources/tutorials/gluon/autograd.txt
rename to _sources/tutorials/gluon/autograd.md.txt
diff --git a/versions/master/_sources/tutorials/gluon/customop.txt b/_sources/tutorials/gluon/customop.md.txt
similarity index 100%
rename from versions/master/_sources/tutorials/gluon/customop.txt
rename to _sources/tutorials/gluon/customop.md.txt
diff --git a/versions/master/_sources/tutorials/gluon/gluon.txt b/_sources/tutorials/gluon/gluon.md.txt
similarity index 96%
rename from versions/master/_sources/tutorials/gluon/gluon.txt
rename to _sources/tutorials/gluon/gluon.md.txt
index a1688ea121dd..ac1aa3f60f5e 100644
--- a/versions/master/_sources/tutorials/gluon/gluon.txt
+++ b/_sources/tutorials/gluon/gluon.md.txt
@@ -102,8 +102,7 @@ To compute loss and backprop for one iteration, we do:
label = mx.nd.arange(10) # dummy label
with autograd.record():
output = net(data)
- L = gluon.loss.SoftmaxCrossEntropyLoss()
- loss = L(output, label)
+ loss = gluon.loss.softmax_cross_entropy_loss(output, label)
loss.backward()
print('loss:', loss)
print('grad:', net.fc1.weight.grad())
@@ -128,10 +127,9 @@ this is a commonly used functionality, gluon provide a `Trainer` class for it:
```python
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01})
-with autograd.record():
+with record():
output = net(data)
- L = gluon.loss.SoftmaxCrossEntropyLoss()
- loss = L(output, label)
+ loss = gluon.loss.softmax_cross_entropy_loss(output, label)
loss.backward()
# do the update. Trainer needs to know the batch size of data to normalize
diff --git a/versions/master/_sources/tutorials/gluon/hybrid.txt b/_sources/tutorials/gluon/hybrid.md.txt
similarity index 100%
rename from versions/master/_sources/tutorials/gluon/hybrid.txt
rename to _sources/tutorials/gluon/hybrid.md.txt
diff --git a/versions/master/_sources/tutorials/gluon/mnist.txt b/_sources/tutorials/gluon/mnist.md.txt
similarity index 100%
rename from versions/master/_sources/tutorials/gluon/mnist.txt
rename to _sources/tutorials/gluon/mnist.md.txt
diff --git a/versions/master/_sources/tutorials/gluon/ndarray.txt b/_sources/tutorials/gluon/ndarray.md.txt
similarity index 100%
rename from versions/master/_sources/tutorials/gluon/ndarray.txt
rename to _sources/tutorials/gluon/ndarray.md.txt
diff --git a/_sources/tutorials/index.md.txt b/_sources/tutorials/index.md.txt
index aed11a4bebf1..32d8bd8ae9d1 100644
--- a/_sources/tutorials/index.md.txt
+++ b/_sources/tutorials/index.md.txt
@@ -2,9 +2,11 @@
These tutorials introduce a few fundamental concepts in deep learning and how to implement them in _MXNet_. The _Basics_ section contains tutorials on manipulating arrays, building networks, loading/preprocessing data, etc. The _Training and Inference_ section talks about implementing Linear Regression, training a Handwritten digit classifier using MLP and CNN, running inferences using a pre-trained model, and lastly, efficiently training a large scale image classifier.
+**Note:** We are working on a set of tutorials for the new imperative interface called Gluon. A preview version is hosted at [thestraightdope.mxnet.io](http://thestraightdope.mxnet.io).
+
## Python
-### Basics
+### Basic
```eval_rst
.. toctree::
diff --git a/_sources/tutorials/r/CustomLossFunction.md.txt b/_sources/tutorials/r/CustomLossFunction.md.txt
index a7104803cacb..afb99518894c 100644
--- a/_sources/tutorials/r/CustomLossFunction.md.txt
+++ b/_sources/tutorials/r/CustomLossFunction.md.txt
@@ -3,57 +3,201 @@ Customized loss function
This tutorial provides guidelines for using customized loss function in network construction.
-
Model Training Example
-----------
+----------------------
Let's begin with a small regression example. We can build and train a regression model with the following code:
+``` r
+data(BostonHousing, package = "mlbench")
+BostonHousing[, sapply(BostonHousing, is.factor)] <-
+ as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
+BostonHousing <- data.frame(scale(BostonHousing))
+
+test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
+train.x = data.matrix(BostonHousing[-test.ind,-14])
+train.y = BostonHousing[-test.ind, 14]
+test.x = data.matrix(BostonHousing[--test.ind,-14])
+test.y = BostonHousing[--test.ind, 14]
+
+require(mxnet)
+```
+
+ ## Loading required package: mxnet
+
+``` r
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
+
+mx.set.seed(0)
+model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+``` r
+pred <- predict(model, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred[1,])^2) / length(test.y)
+```
- ```r
- library(mxnet)
- data(BostonHousing, package="mlbench")
- train.ind = seq(1, 506, 3)
- train.x = data.matrix(BostonHousing[train.ind, -14])
- train.y = BostonHousing[train.ind, 14]
- test.x = data.matrix(BostonHousing[-train.ind, -14])
- test.y = BostonHousing[-train.ind, 14]
- data <- mx.symbol.Variable("data")
- fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
- lro <- mx.symbol.LinearRegressionOutput(fc1)
- mx.set.seed(0)
- model <- mx.model.FeedForward.create(
- lro, X=train.x, y=train.y,
- eval.data=list(data=test.x, label=test.y),
- ctx=mx.cpu(), num.round=10, array.batch.size=20,
- learning.rate=2e-6, momentum=0.9, eval.metric=mx.metric.rmse)
- ```
-
-Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
-However, this might not be enough for real-world models. You can provide your own loss function
-by using `mx.symbol.MakeLoss` when constructing the network.
+ ## [1] 0.2485236
+Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`. However, this might not be enough for real-world models. You can provide your own loss function by using `mx.symbol.MakeLoss` when constructing the network.
How to Use Your Own Loss Function
----------
+---------------------------------
+
+We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`
+
+``` r
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
+```
+
+Then we can train the network just as usual.
+
+``` r
+mx.set.seed(0)
+model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+We should get very similar results because we are actually minimizing the same loss function. However, the result is quite different.
+
+``` r
+pred2 <- predict(model2, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred2)^2) / length(test.y)
+```
+
+ ## [1] 1.234584
+
+This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data. We can get the real prediction as below.
+
+``` r
+internals = internals(model2$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model3 <- list(symbol = fc_symbol,
+ arg.params = model2$arg.params,
+ aux.params = model2$aux.params)
+
+class(model3) <- "MXFeedForwardModel"
+
+pred3 <- predict(model3, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred3[1,])^2) / length(test.y)
+```
+
+ ## [1] 0.248294
+
+We have provided many operations on the symbols. An example of `|pred-label|` can be found below.
+
+``` r
+lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
+mx.set.seed(0)
+model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+``` r
+internals = internals(model4$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model5 <- list(symbol = fc_symbol,
+ arg.params = model4$arg.params,
+ aux.params = model4$aux.params)
+
+class(model5) <- "MXFeedForwardModel"
+
+pred5 <- predict(model5, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum(abs(test.y - pred5[1,])) / length(test.y)
+```
+
+ ## [1] 0.7056902
+
+``` r
+lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
+mx.set.seed(0)
+model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
-We still use our previous example.
+``` r
+pred6 <- predict(model6, test.x)
+```
- ```r
- library(mxnet)
- data <- mx.symbol.Variable("data")
- fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
- lro <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc1, shape = 0) - label))
- ```
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
-In the last line of network definition, we do not use the predefined loss function. We define the loss
-by ourselves, which is `(pred-label)^2`.
+``` r
+sum(abs(test.y - pred6[1,])) / length(test.y)
+```
-We have provided many operations on the symbols, so you can also define `|pred-label|` using the line below.
+ ## [1] 0.7056902
- ```r
- lro <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc1, shape = 0) - label))
- ```
## Next Steps
* [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html)
diff --git a/_sources/tutorials/scala/mnist.md.txt b/_sources/tutorials/scala/mnist.md.txt
index e01ac49ed0c1..ad55ee4c0257 100644
--- a/_sources/tutorials/scala/mnist.md.txt
+++ b/_sources/tutorials/scala/mnist.md.txt
@@ -4,6 +4,12 @@ This Scala tutorial guides you through a classic computer vision application: id
Let's train a 3-layer network (i.e multilayer perceptron network) on the MNIST dataset to classify handwritten digits.
+## Prerequisites
+To complete this tutorial, we need:
+
+- to compile the latest MXNet version. See the MXNet installation instructions for your operating system in [Setup and Installation](http://mxnet.io/get_started/install.html).
+- to compile the Scala API. See Scala API build instructions in [Build](https://github.com/dmlc/mxnet/tree/master/scala-package).
+
## Define the Network
First, define the neural network's architecture using the Symbol API:
@@ -87,7 +93,7 @@ while (valDataIter.hasNext) {
val y = NDArray.concatenate(labels)
// get predicted labels
-val predictedY = NDArray.argmaxChannel(prob)
+val predictedY = NDArray.argmax_channel(prob)
require(y.shape == predictedY.shape)
// calculate accuracy
diff --git a/_sources/tutorials/unsupervised_learning/gan.md.txt b/_sources/tutorials/unsupervised_learning/gan.md.txt
index 6491806c0acc..709e1323c6f6 100644
--- a/_sources/tutorials/unsupervised_learning/gan.md.txt
+++ b/_sources/tutorials/unsupervised_learning/gan.md.txt
@@ -1,5 +1,383 @@
-# Generative Adversarial Network
-Get the source code for an example of a generative adversarial network (GAN) running on MXNet on GitHub in the [gan](https://github.com/dmlc/mxnet/tree/master/example/gan) folder.
+# Generative Adversarial Networks
-## Next Steps
-* [MXNet tutorials index](http://mxnet.io/tutorials/index.html)
\ No newline at end of file
+GANs are an application of unsupervised learning - you don't need labels for your dataset in order to train a GAN.
+
+The GAN framework composes of two neural networks: a generator network and a discriminator network.
+
+The generator's job is to take a set of random numbers and produce data (such as images or text).
+
+The discriminator then takes in that data as well as samples of that data from a dataset and tries to determine if is "fake" (created by the generator network) or "real" (from the original dataset).
+
+During training, the two networks play a game against each other. The generator tries to create realistic data, so that it can fool the discriminator into thinking that the data it generated is from the original dataset. At the same time, the discriminator tries to not be fooled - it learns to become better at determining if data is real or fake.
+
+Since the two networks are fighting in this game, they can be seen as as adversaries, which is where the term "Generative Adverserial Network" comes from.
+
+## Deep Convolutional Generative Adversarial Networks
+
+This tutorial takes a look at Deep Convolutional Generative Adversarial Networks (DCGAN), which combines Convolutional Neural Networks (CNNs) and GANs.
+
+We will create a DCGAN that is able to create images of handwritten digits from random numbers.The tutorial uses the neural net architecture and guidelines outlined in [this paper](https://arxiv.org/abs/1511.06434), and the MNIST dataset.
+
+##How to Use This Tutorial
+You can use this tutorial by executing each snippet of python code in order as it appears in the tutorial.
+
+
+1. The first net is the "generator" and creates images of handwritten digits from random numbers.
+2. The second net is the "discriminator" and determines if the image created by the generator is real (a realistic looking image of handwritten digits) or fake (an image that doesn't look like it came from the original dataset).
+
+Apart from creating a DCGAN, you'll also learn:
+
+- How to manipulate and iterate through batches images that you can feed into your neural network.
+
+- How to create a custom MXNet data iterator that generates random numbers from a normal distribution.
+
+- How to create a custom training process in MXNet, using lower level functions from the MXNet Module API such as .bind() .forward() and .backward(). The training process for a DCGAN is more complex than many other neural net's, so we need to use these functions instead of using the higher level .fit() function.
+
+- How to visualize images as they are going through the training process
+
+## Prerequisites
+
+This tutorial assumes you're familiar with the concept of CNN's and have implemented one in MXNet. You should also be familiar with the concept of logistic regression. Having a basic understanding for MXNet data iterators helps, since we'll create a custom Data Iterator to iterate though random numbers as inputs to our generator network.
+
+This example is designed to be trained on a single GPU. Training this network on CPU can be slow, so it's recommended that you use a GPU for training.
+
+To complete this tutorial, you need:
+
+- MXNet
+- Python 2.7, and the following libraries for Python:
+ - Numpy - for matrix math
+ - OpenCV - for image manipulation
+ - Scikit-learn - to easily get our dataset
+ - Matplotlib - to visualize our output
+
+## The Data
+We need two pieces of data to train our DCGAN:
+ 1. Images of handwritten digits from the MNIST dataset
+ 2. Random numbers from a normal distribution
+
+Our generator network will use the random numbers as the input to produce images of handwritten digits, and out discriminator network will use images of handwritten digits from the MNIST dataset to determine if images produced by our generator are realistic.
+
+We are going to use the python library, scikit-learn, to get the MNIST dataset. Scikit-learn comes with a function that gets the dataset for us, which we will then manipulate to create our training and testing inputs.
+
+The MNIST dataset contains 70,000 images of handwritten digits. Each image is 28x28 pixels in size. To create random numbers, we're going to create a custom MXNet data iterator, which will returns random numbers from a normal distribution as we need then.
+
+## Prepare the Data
+
+### 1. Preparing the MNSIT dataset
+
+Let's start by preparing our handwritten digits from the MNIST dataset. We import the fetch_mldata function from scikit-learn, and use it to get the MNSIT dataset. Notice that it's shape is 70000x784. This contains the 70000 images on every row and 784 pixels of each image in the columns of each row. Each image is 28x28 pixels, but has been flattened so that all 784 images are represented in a single list.
+```python
+from sklearn.datasets import fetch_mldata
+mnist = fetch_mldata('MNIST original')
+```
+
+Next, we'll randomize the handwritten digits by using numpy to create random permutations on the dataset on our rows (images). We'll then reshape our dataset from 70000x786 to 70000x28x28, so that every image in our dataset is arranged into a 28x28 grid, where each cell in the grid represents 1 pixel of the image.
+
+```python
+import numpy as np
+#Use a seed so that we get the same random permutation each time
+np.random.seed(1)
+p = np.random.permutation(mnist.data.shape[0])
+X = mnist.data[p]
+X = X.reshape((70000, 28, 28))
+```
+Since the DCGAN that we're creating takes in a 64x64 image as the input, we'll use OpenCV to resize the each 28x28 image to 64x64 images:
+```python
+import cv2
+X = np.asarray([cv2.resize(x, (64,64)) for x in X])
+```
+Each pixel in our 64x64 image is represented by a number between 0-255, that represents the intensity of the pixel. However, we want to input numbers between -1 and 1 into our DCGAN, as suggested by the research paper. To rescale our pixels to be in the range of -1 to 1, we'll divide each pixel by (255/2). This put our images on a scale of 0-2. We can then subtract by 1, to get them in the range of -1 to 1.
+```python
+X = X.astype(np.float32)/(255.0/2) - 1.0
+```
+Ultimately, images are inputted into our neural net from a 70000x3x64x64 array, and they are currently in a 70000x64x64 array. We need to add 3 channels to our images. Typically when we are working with images, the 3 channels represent the red, green, and blue components of each image. Since the MNIST dataset is grayscale, we only need 1 channel to represent our dataset. We will pad the other channels with 0's:
+
+```python
+X = X.reshape((70000, 1, 64, 64))
+X = np.tile(X, (1, 3, 1, 1))
+```
+Finally, we'll put our images into MXNet's NDArrayIter, which will allow MXNet to easily iterate through our images during training. We'll also split up them images into a batches, with 64 images in each batch. Every time we iterate, we'll get a 4 dimensional array with size (64, 3, 64, 64), representing a batch of 64 images.
+```python
+import mxnet as mx
+batch_size = 64
+image_iter = mx.io.NDArrayIter(X, batch_size=batch_size)
+```
+### 2. Preparing Random Numbers
+
+We need to input random numbers from a normal distribution to our generator network, so we'll create an MXNet DataIter that produces random numbers for each training batch. The DataIter is the base class of MXNet's Data Loading API. Below, we create a class called RandIter which is a subclass of DataIter. We use MXNet's built in mx.random.normal function in order to return the normally distributed random numbers every time we iterate.
+```python
+class RandIter(mx.io.DataIter):
+ def __init__(self, batch_size, ndim):
+ self.batch_size = batch_size
+ self.ndim = ndim
+ self.provide_data = [('rand', (batch_size, ndim, 1, 1))]
+ self.provide_label = []
+
+ def iter_next(self):
+ return True
+
+ def getdata(self):
+ #Returns random numbers from a gaussian (normal) distribution
+ #with mean=0 and standard deviation = 1
+ return [mx.random.normal(0, 1.0, shape=(self.batch_size, self.ndim, 1, 1))]
+```
+When we initalize our RandIter, we need to provide two numbers: the batch size and how many random numbers we want to produce a single image from. This number is referred to as Z, and we'll set this to 100. This value comes from the research paper on the topic. Every time we iterate and get a batch of random numbers, we will get a 4 dimensional array with shape: (batch_size, Z, 1, 1), which in our example is (64, 100, 1, 1).
+```python
+Z = 100
+rand_iter = RandIter(batch_size, Z)
+```
+## Create the Model
+
+Our model has two networks that we will train together - the generator network and the disciminator network.
+
+### The Generator
+
+Let's start off by defining the generator network, which uses deconvolutional layers (also callled fractionally strided layers) to generate an image form random numbers :
+```python
+no_bias = True
+fix_gamma = True
+epsilon = 1e-5 + 1e-12
+
+rand = mx.sym.Variable('rand')
+
+g1 = mx.sym.Deconvolution(rand, name='g1', kernel=(4,4), num_filter=1024, no_bias=no_bias)
+gbn1 = mx.sym.BatchNorm(g1, name='gbn1', fix_gamma=fix_gamma, eps=epsilon)
+gact1 = mx.sym.Activation(gbn1, name='gact1', act_type='relu')
+
+g2 = mx.sym.Deconvolution(gact1, name='g2', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=512, no_bias=no_bias)
+gbn2 = mx.sym.BatchNorm(g2, name='gbn2', fix_gamma=fix_gamma, eps=epsilon)
+gact2 = mx.sym.Activation(gbn2, name='gact2', act_type='relu')
+
+g3 = mx.sym.Deconvolution(gact2, name='g3', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=256, no_bias=no_bias)
+gbn3 = mx.sym.BatchNorm(g3, name='gbn3', fix_gamma=fix_gamma, eps=epsilon)
+gact3 = mx.sym.Activation(gbn3, name='gact3', act_type='relu')
+
+g4 = mx.sym.Deconvolution(gact3, name='g4', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=128, no_bias=no_bias)
+gbn4 = mx.sym.BatchNorm(g4, name='gbn4', fix_gamma=fix_gamma, eps=epsilon)
+gact4 = mx.sym.Activation(gbn4, name='gact4', act_type='relu')
+
+g5 = mx.sym.Deconvolution(gact4, name='g5', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=3, no_bias=no_bias)
+generatorSymbol = mx.sym.Activation(g5, name='gact5', act_type='tanh')
+```
+
+Our generator image starts with random numbers that will be obtained from the RandIter we created earlier, so we created the rand variable for this input.
+We then start creating the model starting with a Deconvolution layer (sometimes called 'fractionally strided layer'). We apply batch normalization and ReLU activation after the Deconvolution layer.
+
+We repeat this process 4 times, applying a (2,2) stride and (1,1) pad at each Deconvolutional layer, which doubles the size of our image at each layer. By creating these layers, our generator network will have to learn to upsample our input vector of random numbers, Z at each layer, so that network output a final image. We also reduce half the number of filters at each layer, reducing dimensionality at each layer. Ultimatley, our output layer is a 64x64x3 layer, representing the size and channels of our image. We use tanh activation instead of relu on the last layer, as recommended by the research on DCGANs. The output of neurons in the final gout layer represent the pixels of generated image.
+
+Notice we used 3 parameters to help us create our model: no_bias, fixed_gamma, and epsilon. Neurons in our network won't have a bias added to them, this seems to work better in practice for the DCGAN. In our batch norm layer, we set fixed_gamma=True, which means gamma=1 for all of our batch norm layers. epsilon is a small number that gets added to our batch norm so that we don't end up dividing by zero. By default, CuDNN requires that this number is greater than 1e-5, so we add a small number to this value, ensuring this values stays small.
+
+### The Discriminator
+
+Let's now create our discriminator network, which will take in images of handwritten digits from the MNIST dataset and images created by the generator network:
+```python
+data = mx.sym.Variable('data')
+
+d1 = mx.sym.Convolution(data, name='d1', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=128, no_bias=no_bias)
+dact1 = mx.sym.LeakyReLU(d1, name='dact1', act_type='leaky', slope=0.2)
+
+d2 = mx.sym.Convolution(dact1, name='d2', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=256, no_bias=no_bias)
+dbn2 = mx.sym.BatchNorm(d2, name='dbn2', fix_gamma=fix_gamma, eps=epsilon)
+dact2 = mx.sym.LeakyReLU(dbn2, name='dact2', act_type='leaky', slope=0.2)
+
+d3 = mx.sym.Convolution(dact2, name='d3', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=512, no_bias=no_bias)
+dbn3 = mx.sym.BatchNorm(d3, name='dbn3', fix_gamma=fix_gamma, eps=epsilon)
+dact3 = mx.sym.LeakyReLU(dbn3, name='dact3', act_type='leaky', slope=0.2)
+
+d4 = mx.sym.Convolution(dact3, name='d4', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=1024, no_bias=no_bias)
+dbn4 = mx.sym.BatchNorm(d4, name='dbn4', fix_gamma=fix_gamma, eps=epsilon)
+dact4 = mx.sym.LeakyReLU(dbn4, name='dact4', act_type='leaky', slope=0.2)
+
+d5 = mx.sym.Convolution(dact4, name='d5', kernel=(4,4), num_filter=1, no_bias=no_bias)
+d5 = mx.sym.Flatten(d5)
+
+label = mx.sym.Variable('label')
+discriminatorSymbol = mx.sym.LogisticRegressionOutput(data=d5, label=label, name='dloss')
+```
+
+We start off by creating the data variable, which is used to hold our input images to the discriminator.
+
+The discriminator then goes through a series of 5 convolutional layers, each with a 4x4 kernel, 2x2 stride, and 1x1 pad. These layers half the size of the image (which starts at 64x64) at each convolutional layer. Our model also increases dimensionality at each layer by doubling the number of filters per convolutional layer, starting at 128 filters and ending at 1024 filters before we flatten the output.
+
+At the final convolution, we flatten the neural net to get one number as the final output of discriminator network. This number is the probability the image is real, as determined by our discriminator. We use logistic regression to determine this probability. When we pass in "real" images from the MNIST dataset, we can label these as 1 and we can label the "fake" images from the generator net as 0 to perform logistic regression on the discriminator network.
+Prepare the models using the Module API
+
+So far we have defined a MXNet Symbol for both the generator and the discriminator network. Before we can train our model, we need to bind these symbols using the Module API, which creates the computation graph for our models. It also allows us to decide how we want to initialize our model and what type of optimizer we want to use. Let's set up Module for both of our networks:
+```python
+#Hyperperameters
+sigma = 0.02
+lr = 0.0002
+beta1 = 0.5
+ctx = mx.gpu(0)
+
+#=============Generator Module=============
+generator = mx.mod.Module(symbol=generatorSymbol, data_names=('rand',), label_names=None, context=ctx)
+generator.bind(data_shapes=rand_iter.provide_data)
+generator.init_params(initializer=mx.init.Normal(sigma))
+generator.init_optimizer(
+ optimizer='adam',
+ optimizer_params={
+ 'learning_rate': lr,
+ 'beta1': beta1,
+ })
+mods = [generator]
+
+# =============Discriminator Module=============
+discriminator = mx.mod.Module(symbol=discriminatorSymbol, data_names=('data',), label_names=('label',), context=ctx)
+discriminator.bind(data_shapes=image_iter.provide_data,
+ label_shapes=[('label', (batch_size,))],
+ inputs_need_grad=True)
+discriminator.init_params(initializer=mx.init.Normal(sigma))
+discriminator.init_optimizer(
+ optimizer='adam',
+ optimizer_params={
+ 'learning_rate': lr,
+ 'beta1': beta1,
+ })
+mods.append(discriminator)
+```
+First, we create Modules for our networks and then bind the symbols that we've created in the previous steps to our modules.
+We use rand_iter.provide_data as the data_shape to bind our generator network. This means that as we iterate though batches of data on the generator Module, our RandIter will provide us with random numbers to feed our Module using it's provide_data function.
+
+Similarly, we bind the discriminator Module to image_iter.provide_data, which gives us images from MNIST from the NDArrayIter we had set up earlier, called image_iter.
+
+Notice that we're using the Normal initialization, with the hyperparameter sigma=0.02. This means our weight initializations for the neurons in our networks will random numbers from a Gaussian (normal) distribution with a mean of 0 and a standard deviation of 0.02.
+
+We also use the adam optimizer for gradient decent. We've set up two hyperparameters, lr and beta1 based on the values used in the DCGAN paper. We're using a single gpu, gpu(0) for training.
+
+### Visualizing Our Training
+Before we train the model, let's set up some helper functions that will help visualize what our generator is producing, compared to what the real image is:
+```python
+from matplotlib import pyplot as plt
+
+#Takes the images in our batch and arranges them in an array so that they can be
+#Plotted using matplotlib
+def fill_buf(buf, num_images, img, shape):
+ width = buf.shape[0]/shape[1]
+ height = buf.shape[1]/shape[0]
+ img_width = (num_images%width)*shape[0]
+ img_hight = (num_images/height)*shape[1]
+ buf[img_hight:img_hight+shape[1], img_width:img_width+shape[0], :] = img
+
+#Plots two images side by side using matplotlib
+def visualize(fake, real):
+ #64x3x64x64 to 64x64x64x3
+ fake = fake.transpose((0, 2, 3, 1))
+ #Pixel values from 0-255
+ fake = np.clip((fake+1.0)*(255.0/2.0), 0, 255).astype(np.uint8)
+ #Repeat for real image
+ real = real.transpose((0, 2, 3, 1))
+ real = np.clip((real+1.0)*(255.0/2.0), 0, 255).astype(np.uint8)
+
+ #Create buffer array that will hold all the images in our batch
+ #Fill the buffer so to arrange all images in the batch onto the buffer array
+ n = np.ceil(np.sqrt(fake.shape[0]))
+ fbuff = np.zeros((int(n*fake.shape[1]), int(n*fake.shape[2]), int(fake.shape[3])), dtype=np.uint8)
+ for i, img in enumerate(fake):
+ fill_buf(fbuff, i, img, fake.shape[1:3])
+ rbuff = np.zeros((int(n*real.shape[1]), int(n*real.shape[2]), int(real.shape[3])), dtype=np.uint8)
+ for i, img in enumerate(real):
+ fill_buf(rbuff, i, img, real.shape[1:3])
+
+ #Create a matplotlib figure with two subplots: one for the real and the other for the fake
+ #fill each plot with our buffer array, which creates the image
+ fig = plt.figure()
+ ax1 = fig.add_subplot(2,2,1)
+ ax1.imshow(fbuff)
+ ax2 = fig.add_subplot(2,2,2)
+ ax2.imshow(rbuff)
+ plt.show()
+```
+
+## Fit the Model
+Training the DCGAN is a complex process that requires multiple steps.
+To fit the model, for every batch of data in our dataset:
+
+1. Use the Z vector, which contains our random numbers to do a forward pass through our generator. This outputs the "fake" image, since it's created from our generator.
+
+2. Use the fake image as the input to do a forward and backwards pass through the discriminator network. We set our labels for our logistic regression to 0 to represent that this is a fake image. This trains the discriminator to learn what a fake image looks like. We save the gradient produced in backpropogation for the next step.
+
+3. Do a forwards and backwards pass through the discriminator using a real image from our dataset. Our label for logistic regression will now be 1 to represent real images, so our discriminator can learn to recognize a real image.
+
+4. Update the discriminator by adding the result of the gradient generated during backpropogation on the fake image with the gradient from backpropogation on the real image.
+
+5. Now that the discriminator has been updated for the this batch, we still need to update the generator. First, do a forward and backwards pass with the same batch on the updated discriminator, to produce a new gradient. Use the new gradient to do a backwards pass
+
+Here's the main training loop for our DCGAN:
+
+```python
+# =============train===============
+print('Training...')
+for epoch in range(1):
+ image_iter.reset()
+ for i, batch in enumerate(image_iter):
+ #Get a batch of random numbers to generate an image from the generator
+ rbatch = rand_iter.next()
+ #Forward pass on training batch
+ generator.forward(rbatch, is_train=True)
+ #Output of training batch is the 64x64x3 image
+ outG = generator.get_outputs()
+
+ #Pass the generated (fake) image through the discriminator, and save the gradient
+ #Label (for logistic regression) is an array of 0's since this image is fake
+ label = mx.nd.zeros((batch_size,), ctx=ctx)
+ #Forward pass on the output of the discriminator network
+ discriminator.forward(mx.io.DataBatch(outG, [label]), is_train=True)
+ #Do the backwards pass and save the gradient
+ discriminator.backward()
+ gradD = [[grad.copyto(grad.context) for grad in grads] for grads in discriminator._exec_group.grad_arrays]
+
+ #Pass a batch of real images from MNIST through the discriminator
+ #Set the label to be an array of 1's because these are the real images
+ label[:] = 1
+ batch.label = [label]
+ #Forward pass on a batch of MNIST images
+ discriminator.forward(batch, is_train=True)
+ #Do the backwards pass and add the saved gradient from the fake images to the gradient
+ #generated by this backwards pass on the real images
+ discriminator.backward()
+ for gradsr, gradsf in zip(discriminator._exec_group.grad_arrays, gradD):
+ for gradr, gradf in zip(gradsr, gradsf):
+ gradr += gradf
+ #Update gradient on the discriminator
+ discriminator.update()
+
+ #Now that we've updated the discriminator, let's update the generator
+ #First do a forward pass and backwards pass on the newly updated discriminator
+ #With the current batch
+ discriminator.forward(mx.io.DataBatch(outG, [label]), is_train=True)
+ discriminator.backward()
+ #Get the input gradient from the backwards pass on the discriminator,
+ #and use it to do the backwards pass on the generator
+ diffD = discriminator.get_input_grads()
+ generator.backward(diffD)
+ #Update the gradients on the generator
+ generator.update()
+
+ #Increment to the next batch, printing every 50 batches
+ i += 1
+ if i % 50 == 0:
+ print('epoch:', epoch, 'iter:', i)
+ print
+ print(" From generator: From MNIST:")
+
+ visualize(outG[0].asnumpy(), batch.data[0].asnumpy())
+```
+
+This causes our GAN to train and we can visualize the progress that we're making as our networks train. After every 25 iterations, we're calling the visualize function that we created earlier, which creates the visual plots during training.
+
+The plot on our left will represent what our generator created (the fake image) in the most recent iteration. The plot on the right will represent the original (real) image from the MNIST dataset that was inputted to the discriminator on the same iteration.
+
+As training goes on the generator becomes better at generating realistic images. You can see this happening since images on the left become closer to the original dataset with each iteration.
+
+## Summary
+
+We've now sucessfully used Apache MXNet to train a Deep Convolutional GAN using the MNIST dataset.
+
+As a result, we've created two neural nets: a generator, which is able to create images of handwritten digits from random numbers, and a discriminator, which is able to take an image and determine if it is an image of handwritten digits.
+
+Along the way, we've learned how to do the image manipulation and visualization that's associted with training deep neural nets. We've also learned how to some of MXNet's advanced training functionality to fit our model.
+
+## Acknowledgements
+This tutorial is based on [MXNet DCGAN codebase](https://github.com/apache/incubator-mxnet/blob/master/example/gan/dcgan.py),
+[The original paper on GANs](https://arxiv.org/abs/1406.2661), as well as [this paper on deep convolutional GANs](https://arxiv.org/abs/1511.06434).
\ No newline at end of file
diff --git a/_static/mxnet-theme/footer.html b/_static/mxnet-theme/footer.html
index 45ba457a0722..fff183092b85 100644
--- a/_static/mxnet-theme/footer.html
+++ b/_static/mxnet-theme/footer.html
@@ -1,5 +1,5 @@
diff --git a/_static/mxnet-theme/index.html b/_static/mxnet-theme/index.html
index 7958eab0080a..2857934cb806 100644
--- a/_static/mxnet-theme/index.html
+++ b/_static/mxnet-theme/index.html
@@ -1,134 +1,119 @@
-
-
-
-
A Flexible and Efficient Library for Deep Learning
-
-
-
-
-
-
-
+
+
+
+
A Flexible and Efficient Library for Deep Learning
+
+
+
+
+
+