From c202b363b1d2bd7346779cf74b41473cfd2e0fd8 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 6 May 2019 14:04:20 +0800
Subject: [PATCH 01/24] improve mkldnn document

---
 docs/install/index.md                    |   7 +-
 docs/tutorials/mkldnn/MKLDNN_README.md   | 627 ++++++++++++-----------
 docs/tutorials/mkldnn/operator_list.md   |  73 +++
 tests/tutorials/test_sanity_tutorials.py |   1 +
 4 files changed, 394 insertions(+), 314 deletions(-)
 create mode 100644 docs/tutorials/mkldnn/operator_list.md
diff --git a/docs/install/index.md b/docs/install/index.md
index 10db8d95b44a..cf56472b8e25 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -187,7 +187,12 @@ $ pip install mxnet --pre
 
 </div> <!-- End of master-->
 <hr> <!-- pip footer -->
-MXNet offers MKL pip packages that will be much faster when running on Intel hardware.
+MXNet offers MKL pip packages that will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in [performance on Intel CPU](https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu).
+
+```
+$ pip install mxnet-mkl --pre
+```
+
 Check the chart below for other options, refer to <a href="https://pypi.org/project/mxnet/">PyPI for other MXNet pip packages</a>, or <a href="validate_mxnet.html">validate your MXNet installation</a>.
 
 <img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/install/pip-packages-1.4.0.png" alt="pip packages"/>
diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
index c5779670cd87..821e8b17d42f 100644
--- a/docs/tutorials/mkldnn/MKLDNN_README.md
+++ b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -15,297 +15,299 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Build/Install MXNet with MKL-DNN
-
-A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
-In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
-
-The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
-
-
-<h2 id="0">Contents</h2>
-
-* [1. Linux](#1)
-* [2. MacOS](#2)
-* [3. Windows](#3)
-* [4. Verify MXNet with python](#4)
-* [5. Enable MKL BLAS](#5)
-* [6. Enable graph optimization](#6)
-* [7. Quantization](#7)
-* [8. Support](#8)
-
-<h2 id="1">Linux</h2>
-
-### Prerequisites
-
-```
-sudo apt-get update
-sudo apt-get install -y build-essential git
-sudo apt-get install -y libopenblas-dev liblapack-dev
-sudo apt-get install -y libopencv-dev
-sudo apt-get install -y graphviz
-```
-
-### Clone MXNet sources
-
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-cd incubator-mxnet
-```
-
-### Build MXNet with MKL-DNN
-
-```
-make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel
-```
-
-If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas.
-
-<h2 id="2">MacOS</h2>
-
-### Prerequisites
-
-Install the dependencies, required for MXNet, with the following commands:
-
-- [Homebrew](https://brew.sh/)
-- llvm (clang in macOS does not support OpenMP)
-- OpenCV (for computer vision operations)
-
-```
-# Paste this command in Mac terminal to install Homebrew
-/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-
-# install dependency
-brew update
-brew install pkg-config
-brew install graphviz
-brew tap homebrew/core
-brew install opencv
-brew tap homebrew/versions
-brew install llvm
-```
-
-### Clone MXNet sources
-
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-cd incubator-mxnet
-```
-
-### Build MXNet with MKL-DNN
-
-```
-LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1
-```
-
-<h2 id="3">Windows</h2>
-
-On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN.
-[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.
-
-**Visual Studio 2015**
-
-To build and install MXNet yourself, you need the following dependencies. Install the required dependencies:
-
-1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition.
-2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed.
-3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download).
-4. Unzip the OpenCV package.
-5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable.
-6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in
-```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```.
-7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH.
-8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. 
-
-After you have installed all of the required dependencies, build the MXNet source code:
-
-1. Download the MXNet source code from [GitHub](https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules:
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-```
-
-2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
-
-3. Start a Visual Studio command prompt.
-
-4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the 
-[CMake 3](https://cmake.org/) command:
-```
-mkdir build
-cd build
-cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
-```
-
-5. In Visual Studio, open the solution file,```.sln```, and compile it.
-These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder.
-Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/```
-
-6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
-
-**Visual Studio 2017**
-
-To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies:
-
-1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition.
-2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed.
-3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download).
-4. Unzip the OpenCV package.
-5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```).
-6. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download).
-7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```).
-
-After you have installed all of the required dependencies, build the MXNet source code:
-
-1. Start ```cmd``` in windows.
-
-2. Download the MXNet source code from GitHub by using following command:
-
-```r
-cd C:\
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-```
-
-3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
-
-4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```.
-
-5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path):
-
-```r
-"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11
-```
-
-6. Create a build dir using the following command and go to the directory, for example:
-
-```r
-mkdir C:\build
-cd C:\build
-```
-
-7. CMake the MXNet source code by using following command:
-
-```r
-cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
-```
-
-8. After the CMake successfully completed, compile the the MXNet source code by using following command:
-
-```r
-msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount
-```
-
-9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
-
-<h2 id="4">Verify MXNet with python</h2>
-
-```
-cd python
-sudo python setup.py install
-python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());"
-
-Expected Output:
-
-[[ 2.  2.  2.]
- [ 2.  2.  2.]]
-```
-
-### Verify whether MKL-DNN works
-
-After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer.
-
-```
-import mxnet as mx
-import numpy as np
-
-num_filter = 32
-kernel = (3, 3)
-pad = (1, 1)
-shape = (32, 32, 256, 256)
-
-x = mx.sym.Variable('x')
-w = mx.sym.Variable('w')
-y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad)
-exe = y.simple_bind(mx.cpu(), x=shape)
-
-exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
-exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
-
-exe.forward(is_train=False)
-o = exe.outputs[0]
-t = o.asnumpy()
-```
-
-More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE':
-```
-export MKLDNN_VERBOSE=1
-```
-For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution.
-```
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688
-mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819
-```
-
-<h2 id="5">Enable MKL BLAS</h2>
-
-With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models.
-You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license).
-Installing the full MKL installation enables MKL support for all operators under the linalg namespace.
-
-  1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl)
-
-  2. Run `make -j ${nproc} USE_BLAS=mkl`
-
-  3. Navigate into the python directory
-
-  4. Run `sudo python setup.py install`
-
-### Verify whether MKL works
-
-After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer.
-
-```
-import mxnet as mx
-import numpy as np
-
-shape_x = (1, 10, 8)
-shape_w = (1, 12, 8)
-
-x_npy = np.random.normal(0, 1, shape_x)
-w_npy = np.random.normal(0, 1, shape_w)
-
-x = mx.sym.Variable('x')
-w = mx.sym.Variable('w')
-y = mx.sym.batch_dot(x, w, transpose_b=True)
-exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape)
-
-exe.forward(is_train=False)
-o = exe.outputs[0]
-t = o.asnumpy()
-```
-
-You can open the `MKL_VERBOSE` flag by setting environment variable:
-```
-export MKL_VERBOSE=1
-```
-Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message.
-```
-Numpy + Intel(R) MKL: THREADING LAYER: (null)
-Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
-Numpy + Intel(R) MKL: preloading libiomp5.so runtime
-MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0
-MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:40 WDiv:HOST:+0.000
-```
-
-<h2 id="6">Enable graph optimization</h2>
-
-Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance:
-
-```
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
-```
+# Build/Install MXNet with MKL-DNN
+
+A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
+In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
+
+Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](http://mxnet.incubator.apache.org/tutorials/mkldnn/operator_list.html)
+
+The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
+
+
+<h2 id="0">Contents</h2>
+
+* [1. Linux](#1)
+* [2. MacOS](#2)
+* [3. Windows](#3)
+* [4. Verify MXNet with python](#4)
+* [5. Enable MKL BLAS](#5)
+* [6. Enable graph optimization](#6)
+* [7. Quantization](#7)
+* [8. Support](#8)
+
+<h2 id="1">Linux</h2>
+
+### Prerequisites
+
+```
+sudo apt-get update
+sudo apt-get install -y build-essential git
+sudo apt-get install -y libopenblas-dev liblapack-dev
+sudo apt-get install -y libopencv-dev
+sudo apt-get install -y graphviz
+```
+
+### Clone MXNet sources
+
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+cd incubator-mxnet
+```
+
+### Build MXNet with MKL-DNN
+
+```
+make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel
+```
+
+If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas.
+
+<h2 id="2">MacOS</h2>
+
+### Prerequisites
+
+Install the dependencies, required for MXNet, with the following commands:
+
+- [Homebrew](https://brew.sh/)
+- llvm (clang in macOS does not support OpenMP)
+- OpenCV (for computer vision operations)
+
+```
+# Paste this command in Mac terminal to install Homebrew
+/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+
+# install dependency
+brew update
+brew install pkg-config
+brew install graphviz
+brew tap homebrew/core
+brew install opencv
+brew tap homebrew/versions
+brew install llvm
+```
+
+### Clone MXNet sources
+
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+cd incubator-mxnet
+```
+
+### Build MXNet with MKL-DNN
+
+```
+LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1
+```
+
+<h2 id="3">Windows</h2>
+
+On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN.
+[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.
+
+**Visual Studio 2015**
+
+To build and install MXNet yourself, you need the following dependencies. Install the required dependencies:
+
+1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition.
+2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed.
+3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download).
+4. Unzip the OpenCV package.
+5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable.
+6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in
+```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```.
+7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH.
+8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. 
+
+After you have installed all of the required dependencies, build the MXNet source code:
+
+1. Download the MXNet source code from [GitHub](https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules:
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+```
+
+2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
+
+3. Start a Visual Studio command prompt.
+
+4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the 
+[CMake 3](https://cmake.org/) command:
+```
+mkdir build
+cd build
+cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
+```
+
+5. In Visual Studio, open the solution file,```.sln```, and compile it.
+These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder.
+Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/```
+
+6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
+
+**Visual Studio 2017**
+
+To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies:
+
+1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition.
+2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed.
+3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download).
+4. Unzip the OpenCV package.
+5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```).
+6. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download).
+7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```).
+
+After you have installed all of the required dependencies, build the MXNet source code:
+
+1. Start ```cmd``` in windows.
+
+2. Download the MXNet source code from GitHub by using following command:
+
+```r
+cd C:\
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+```
+
+3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
+
+4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```.
+
+5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path):
+
+```r
+"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11
+```
+
+6. Create a build dir using the following command and go to the directory, for example:
+
+```r
+mkdir C:\build
+cd C:\build
+```
+
+7. CMake the MXNet source code by using following command:
+
+```r
+cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
+```
+
+8. After the CMake successfully completed, compile the the MXNet source code by using following command:
+
+```r
+msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount
+```
+
+9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
+
+<h2 id="4">Verify MXNet with python</h2>
+
+```
+cd python
+sudo python setup.py install
+python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());"
+
+Expected Output:
+
+[[ 2.  2.  2.]
+ [ 2.  2.  2.]]
+```
+
+### Verify whether MKL-DNN works
+
+After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer.
+
+```
+import mxnet as mx
+import numpy as np
+
+num_filter = 32
+kernel = (3, 3)
+pad = (1, 1)
+shape = (32, 32, 256, 256)
+
+x = mx.sym.Variable('x')
+w = mx.sym.Variable('w')
+y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad)
+exe = y.simple_bind(mx.cpu(), x=shape)
+
+exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
+exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
+
+exe.forward(is_train=False)
+o = exe.outputs[0]
+t = o.asnumpy()
+```
+
+More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE':
+```
+export MKLDNN_VERBOSE=1
+```
+For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution.
+```
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688
+mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819
+```
+
+<h2 id="5">Enable MKL BLAS</h2>
+
+With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models.
+You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license).
+Installing the full MKL installation enables MKL support for all operators under the linalg namespace.
+
+  1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl)
+
+  2. Run `make -j ${nproc} USE_BLAS=mkl`
+
+  3. Navigate into the python directory
+
+  4. Run `sudo python setup.py install`
+
+### Verify whether MKL works
+
+After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer.
+
+```
+import mxnet as mx
+import numpy as np
+
+shape_x = (1, 10, 8)
+shape_w = (1, 12, 8)
+
+x_npy = np.random.normal(0, 1, shape_x)
+w_npy = np.random.normal(0, 1, shape_w)
+
+x = mx.sym.Variable('x')
+w = mx.sym.Variable('w')
+y = mx.sym.batch_dot(x, w, transpose_b=True)
+exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape)
+
+exe.forward(is_train=False)
+o = exe.outputs[0]
+t = o.asnumpy()
+```
+
+You can open the `MKL_VERBOSE` flag by setting environment variable:
+```
+export MKL_VERBOSE=1
+```
+Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message.
+```
+Numpy + Intel(R) MKL: THREADING LAYER: (null)
+Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
+Numpy + Intel(R) MKL: preloading libiomp5.so runtime
+MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0
+MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:40 WDiv:HOST:+0.000
+```
+
+<h2 id="6">Enable graph optimization</h2>
+
+Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance:
+
+```
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+```
 
 When `MKLDNN` backend is enabled, advanced control options are avaliable:
 
@@ -314,25 +316,24 @@ export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization
 export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
 ```
 
-
-This limitations of this experimental feature are:
-
-- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable.
-
-- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. 
-
-- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN).
-
-<h2 id="7">Quantization and Inference with INT8</h2>
-
-Benefiting from Intel MKL-DNN, MXNet built with Intel MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel CPU Platform on Intel Xeon Scalable Platform.
-
-- [CNN Quantization Examples](https://github.com/apache/incubator-mxnet/tree/master/example/quantization).
-
-<h2 id="8">Next Steps and Support</h2>
-
-- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website.
-
-- For questions or support specific to MKL, visit the [Intel MKLDNN](https://github.com/intel/mkl-dnn) website.
-
-- If you find bugs, please open an issue on GitHub for [MXNet with MKL](https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](https://github.com/apache/incubator-mxnet/labels/MKLDNN).
+This limitations of this experimental feature are:
+
+- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable.
+
+- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. 
+
+- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN).
+
+<h2 id="7">Quantization and Inference with INT8</h2>
+
+Benefiting from Intel MKL-DNN, MXNet built with Intel MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel CPU Platform on Intel Xeon Scalable Platform.
+
+- [CNN Quantization Examples](https://github.com/apache/incubator-mxnet/tree/master/example/quantization).
+
+<h2 id="8">Next Steps and Support</h2>
+
+- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website.
+
+- For questions or support specific to MKL, visit the [Intel MKLDNN](https://github.com/intel/mkl-dnn) website.
+
+- If you find bugs, please open an issue on GitHub for [MXNet with MKL](https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](https://github.com/apache/incubator-mxnet/labels/MKLDNN).
diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
new file mode 100644
index 000000000000..20d972d5a062
--- /dev/null
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -0,0 +1,73 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# MKL-DNN Operator list
+
+MXNet MKL-DNN backend provides optimized implementations for various opertors covering a broad range of applications including image classification, object detection, natural language processing. We also provide the lower precision version for part of these operators on CPU leveraging the DL Boost technology from Intel. On computation graph level, a set of graph fusion pass and quantization pass is implemneted based on the sugraph feature of MXNet. To help users understanding MKL-DNN backend better, the tables below summarize the list of supported operators, data types and functionalities. As the community keeps working on more new features for MKL-DNN backend, the tables will be updated continuously.
+
+
+| Operator | Function | FP32 Training (backward) | FP32 Inference | INT8 Inference |
+| :--: | :--: | :--: | :--: | :--: |
+| **Convolution** | 1D Convolution | Y | Y | N |
+|  | 2D Convolution | Y | Y | Y |
+|  | 3D Convolution | Y | Y | N |
+| **Deconvolution** | 2D Deconvolution | Y | Y | N |
+|  | 3D Deconvolution | Y | Y | N |
+| **FullyConnected** | 1D-4D input,   flatten=Ture | N | Y | Y |
+|  | 1D-4D input,   flatten=False | N | Y | Y |
+| **Pooling** | 2D max Pooling | Y | Y | Y |
+|  | 2D avg pooling | Y | Y | Y |
+| **BatchNorm** | 2D BatchNorm | Y | Y | N |
+| **LRN** | 2D LRN | Y | Y | N |
+| **Activation** | ReLU | Y | Y | Y |
+|  | Tanh | Y | Y | N |
+|  | SoftReLU | Y | Y | N |
+|  | Sigmoid | Y | Y | N |
+| **softmax** | 1D-4D input | Y | Y | N |
+| **Softmax_output** | 1D-4D input | N | Y | N |
+| **Transpose** | 1D-4D input | N | Y | N |
+| **elemwise_add** | 1D-4D input | Y | Y | Y |
+| **Concat** | 1D-4D input | Y | Y | Y |
+| **slice** | 1D-4D input | N | Y | N |
+| **Quantization** | 1D-4D input | N | N | Y |
+| **Dequantization** | 1D-4D input | N | N | Y |
+| **Requantization** | 1D-4D input | N | N | Y |
+
+
+Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables.
+
+| Fusion pattern | Enable | Disable |
+| :--: | :--: | :--: |
+| Convolution + Activation(ReLU) |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + elemwise_add |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM |
+| Convolution + BatchNorm |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN |
+| Convolution + Activation(ReLu) + elemwise_add |   |   |
+| Convolution + BatchNorm + Activation(ReLu) + elemwise_add |   |   |
+| FullyConnected + Activation(ReLU) |   | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU |
+| Convolution (INT8) + re-quantization |   |   |
+| FullyConnected (INT8) + re-quantization |   |   |
+
+
+To try these features out, you can install MXNet MKL-DNN backend through pip:
+
+```
+pip install mxnet-mkl
+```
+
+To build MXNet MKL-DNN backend from source code, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)
+
+For performance numbers, please refer to [performance on Intel CPU](https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu)
diff --git a/tests/tutorials/test_sanity_tutorials.py b/tests/tutorials/test_sanity_tutorials.py
index 7865000c7608..f89c23484568 100644
--- a/tests/tutorials/test_sanity_tutorials.py
+++ b/tests/tutorials/test_sanity_tutorials.py
@@ -35,6 +35,7 @@
              'gluon/index.md',
              'mkldnn/index.md',
              'mkldnn/MKLDNN_README.md',
+             'mkldnn/operator_list.md',
              'nlp/index.md',
              'onnx/index.md',
              'python/index.md',

From 6fbfe4887cdbb712ec7f1d8121846f41671e3abf Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 6 May 2019 14:21:22 +0800
Subject: [PATCH 02/24] fix

---
 docs/tutorials/mkldnn/MKLDNN_README.md | 623 +++++++++++++------------
 1 file changed, 312 insertions(+), 311 deletions(-)

diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
index 821e8b17d42f..18d26ab4e358 100644
--- a/docs/tutorials/mkldnn/MKLDNN_README.md
+++ b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -15,299 +15,299 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Build/Install MXNet with MKL-DNN
-
-A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
-In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
+# Build/Install MXNet with MKL-DNN
+
+A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
+In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
 
 Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](http://mxnet.incubator.apache.org/tutorials/mkldnn/operator_list.html)
 
-The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
-
-
-<h2 id="0">Contents</h2>
-
-* [1. Linux](#1)
-* [2. MacOS](#2)
-* [3. Windows](#3)
-* [4. Verify MXNet with python](#4)
-* [5. Enable MKL BLAS](#5)
-* [6. Enable graph optimization](#6)
-* [7. Quantization](#7)
-* [8. Support](#8)
-
-<h2 id="1">Linux</h2>
-
-### Prerequisites
-
-```
-sudo apt-get update
-sudo apt-get install -y build-essential git
-sudo apt-get install -y libopenblas-dev liblapack-dev
-sudo apt-get install -y libopencv-dev
-sudo apt-get install -y graphviz
-```
-
-### Clone MXNet sources
-
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-cd incubator-mxnet
-```
-
-### Build MXNet with MKL-DNN
-
-```
-make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel
-```
-
-If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas.
-
-<h2 id="2">MacOS</h2>
-
-### Prerequisites
-
-Install the dependencies, required for MXNet, with the following commands:
-
-- [Homebrew](https://brew.sh/)
-- llvm (clang in macOS does not support OpenMP)
-- OpenCV (for computer vision operations)
-
-```
-# Paste this command in Mac terminal to install Homebrew
-/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-
-# install dependency
-brew update
-brew install pkg-config
-brew install graphviz
-brew tap homebrew/core
-brew install opencv
-brew tap homebrew/versions
-brew install llvm
-```
-
-### Clone MXNet sources
-
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-cd incubator-mxnet
-```
-
-### Build MXNet with MKL-DNN
-
-```
-LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1
-```
-
-<h2 id="3">Windows</h2>
-
-On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN.
-[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.
-
-**Visual Studio 2015**
-
-To build and install MXNet yourself, you need the following dependencies. Install the required dependencies:
-
-1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition.
-2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed.
-3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download).
-4. Unzip the OpenCV package.
-5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable.
-6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in
-```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```.
-7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH.
-8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. 
-
-After you have installed all of the required dependencies, build the MXNet source code:
-
-1. Download the MXNet source code from [GitHub](https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules:
-```
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-```
-
-2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
-
-3. Start a Visual Studio command prompt.
-
-4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the 
-[CMake 3](https://cmake.org/) command:
-```
-mkdir build
-cd build
-cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
-```
-
-5. In Visual Studio, open the solution file,```.sln```, and compile it.
-These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder.
-Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/```
-
-6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
-
-**Visual Studio 2017**
-
-To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies:
-
-1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition.
-2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed.
-3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download).
-4. Unzip the OpenCV package.
-5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```).
-6. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download).
-7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```).
-
-After you have installed all of the required dependencies, build the MXNet source code:
-
-1. Start ```cmd``` in windows.
-
-2. Download the MXNet source code from GitHub by using following command:
-
-```r
-cd C:\
-git clone --recursive https://github.com/apache/incubator-mxnet.git
-```
-
-3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
-
-4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```.
-
-5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path):
-
-```r
-"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11
-```
-
-6. Create a build dir using the following command and go to the directory, for example:
-
-```r
-mkdir C:\build
-cd C:\build
-```
-
-7. CMake the MXNet source code by using following command:
-
-```r
-cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
-```
-
-8. After the CMake successfully completed, compile the the MXNet source code by using following command:
-
-```r
-msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount
-```
-
-9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
-
-<h2 id="4">Verify MXNet with python</h2>
-
-```
-cd python
-sudo python setup.py install
-python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());"
-
-Expected Output:
-
-[[ 2.  2.  2.]
- [ 2.  2.  2.]]
-```
-
-### Verify whether MKL-DNN works
-
-After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer.
-
-```
-import mxnet as mx
-import numpy as np
-
-num_filter = 32
-kernel = (3, 3)
-pad = (1, 1)
-shape = (32, 32, 256, 256)
-
-x = mx.sym.Variable('x')
-w = mx.sym.Variable('w')
-y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad)
-exe = y.simple_bind(mx.cpu(), x=shape)
-
-exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
-exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
-
-exe.forward(is_train=False)
-o = exe.outputs[0]
-t = o.asnumpy()
-```
-
-More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE':
-```
-export MKLDNN_VERBOSE=1
-```
-For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution.
-```
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688
-mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254
-mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819
-```
-
-<h2 id="5">Enable MKL BLAS</h2>
-
-With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models.
-You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license).
-Installing the full MKL installation enables MKL support for all operators under the linalg namespace.
-
-  1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl)
-
-  2. Run `make -j ${nproc} USE_BLAS=mkl`
-
-  3. Navigate into the python directory
-
-  4. Run `sudo python setup.py install`
-
-### Verify whether MKL works
-
-After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer.
-
-```
-import mxnet as mx
-import numpy as np
-
-shape_x = (1, 10, 8)
-shape_w = (1, 12, 8)
-
-x_npy = np.random.normal(0, 1, shape_x)
-w_npy = np.random.normal(0, 1, shape_w)
-
-x = mx.sym.Variable('x')
-w = mx.sym.Variable('w')
-y = mx.sym.batch_dot(x, w, transpose_b=True)
-exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape)
-
-exe.forward(is_train=False)
-o = exe.outputs[0]
-t = o.asnumpy()
-```
-
-You can open the `MKL_VERBOSE` flag by setting environment variable:
-```
-export MKL_VERBOSE=1
-```
-Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message.
-```
-Numpy + Intel(R) MKL: THREADING LAYER: (null)
-Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
-Numpy + Intel(R) MKL: preloading libiomp5.so runtime
-MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0
-MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:40 WDiv:HOST:+0.000
-```
-
-<h2 id="6">Enable graph optimization</h2>
-
-Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance:
-
-```
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
-```
+The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
+
+
+<h2 id="0">Contents</h2>
+
+* [1. Linux](#1)
+* [2. MacOS](#2)
+* [3. Windows](#3)
+* [4. Verify MXNet with python](#4)
+* [5. Enable MKL BLAS](#5)
+* [6. Enable graph optimization](#6)
+* [7. Quantization](#7)
+* [8. Support](#8)
+
+<h2 id="1">Linux</h2>
+
+### Prerequisites
+
+```
+sudo apt-get update
+sudo apt-get install -y build-essential git
+sudo apt-get install -y libopenblas-dev liblapack-dev
+sudo apt-get install -y libopencv-dev
+sudo apt-get install -y graphviz
+```
+
+### Clone MXNet sources
+
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+cd incubator-mxnet
+```
+
+### Build MXNet with MKL-DNN
+
+```
+make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel
+```
+
+If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas.
+
+<h2 id="2">MacOS</h2>
+
+### Prerequisites
+
+Install the dependencies, required for MXNet, with the following commands:
+
+- [Homebrew](https://brew.sh/)
+- llvm (clang in macOS does not support OpenMP)
+- OpenCV (for computer vision operations)
+
+```
+# Paste this command in Mac terminal to install Homebrew
+/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+
+# install dependency
+brew update
+brew install pkg-config
+brew install graphviz
+brew tap homebrew/core
+brew install opencv
+brew tap homebrew/versions
+brew install llvm
+```
+
+### Clone MXNet sources
+
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+cd incubator-mxnet
+```
+
+### Build MXNet with MKL-DNN
+
+```
+LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1
+```
+
+<h2 id="3">Windows</h2>
+
+On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN.
+[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.
+
+**Visual Studio 2015**
+
+To build and install MXNet yourself, you need the following dependencies. Install the required dependencies:
+
+1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition.
+2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed.
+3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download).
+4. Unzip the OpenCV package.
+5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable.
+6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in
+```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```.
+7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH.
+8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. 
+
+After you have installed all of the required dependencies, build the MXNet source code:
+
+1. Download the MXNet source code from [GitHub](https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules:
+```
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+```
+
+2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
+
+3. Start a Visual Studio command prompt.
+
+4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the 
+[CMake 3](https://cmake.org/) command:
+```
+mkdir build
+cd build
+cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
+```
+
+5. In Visual Studio, open the solution file,```.sln```, and compile it.
+These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder.
+Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/```
+
+6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
+
+**Visual Studio 2017**
+
+To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies:
+
+1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition.
+2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed.
+3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download).
+4. Unzip the OpenCV package.
+5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```).
+6. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download).
+7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```).
+
+After you have installed all of the required dependencies, build the MXNet source code:
+
+1. Start ```cmd``` in windows.
+
+2. Download the MXNet source code from GitHub by using following command:
+
+```r
+cd C:\
+git clone --recursive https://github.com/apache/incubator-mxnet.git
+```
+
+3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root.
+
+4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```.
+
+5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path):
+
+```r
+"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11
+```
+
+6. Create a build dir using the following command and go to the directory, for example:
+
+```r
+mkdir C:\build
+cd C:\build
+```
+
+7. CMake the MXNet source code by using following command:
+
+```r
+cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
+```
+
+8. After the CMake successfully completed, compile the the MXNet source code by using following command:
+
+```r
+msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount
+```
+
+9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet.
+
+<h2 id="4">Verify MXNet with python</h2>
+
+```
+cd python
+sudo python setup.py install
+python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());"
+
+Expected Output:
+
+[[ 2.  2.  2.]
+ [ 2.  2.  2.]]
+```
+
+### Verify whether MKL-DNN works
+
+After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer.
+
+```
+import mxnet as mx
+import numpy as np
+
+num_filter = 32
+kernel = (3, 3)
+pad = (1, 1)
+shape = (32, 32, 256, 256)
+
+x = mx.sym.Variable('x')
+w = mx.sym.Variable('w')
+y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad)
+exe = y.simple_bind(mx.cpu(), x=shape)
+
+exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
+exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
+
+exe.forward(is_train=False)
+o = exe.outputs[0]
+t = o.asnumpy()
+```
+
+More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE':
+```
+export MKLDNN_VERBOSE=1
+```
+For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution.
+```
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688
+mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254
+mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819
+```
+
+<h2 id="5">Enable MKL BLAS</h2>
+
+With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models.
+You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license).
+Installing the full MKL installation enables MKL support for all operators under the linalg namespace.
+
+  1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl)
+
+  2. Run `make -j ${nproc} USE_BLAS=mkl`
+
+  3. Navigate into the python directory
+
+  4. Run `sudo python setup.py install`
+
+### Verify whether MKL works
+
+After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer.
+
+```
+import mxnet as mx
+import numpy as np
+
+shape_x = (1, 10, 8)
+shape_w = (1, 12, 8)
+
+x_npy = np.random.normal(0, 1, shape_x)
+w_npy = np.random.normal(0, 1, shape_w)
+
+x = mx.sym.Variable('x')
+w = mx.sym.Variable('w')
+y = mx.sym.batch_dot(x, w, transpose_b=True)
+exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape)
+
+exe.forward(is_train=False)
+o = exe.outputs[0]
+t = o.asnumpy()
+```
+
+You can open the `MKL_VERBOSE` flag by setting environment variable:
+```
+export MKL_VERBOSE=1
+```
+Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message.
+```
+Numpy + Intel(R) MKL: THREADING LAYER: (null)
+Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
+Numpy + Intel(R) MKL: preloading libiomp5.so runtime
+MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0
+MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:40 WDiv:HOST:+0.000
+```
+
+<h2 id="6">Enable graph optimization</h2>
+
+Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance:
+
+```
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+```
 
 When `MKLDNN` backend is enabled, advanced control options are avaliable:
 
@@ -316,24 +316,25 @@ export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization
 export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
 ```
 
-This limitations of this experimental feature are:
-
-- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable.
-
-- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. 
-
-- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN).
-
-<h2 id="7">Quantization and Inference with INT8</h2>
-
-Benefiting from Intel MKL-DNN, MXNet built with Intel MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel CPU Platform on Intel Xeon Scalable Platform.
-
-- [CNN Quantization Examples](https://github.com/apache/incubator-mxnet/tree/master/example/quantization).
-
-<h2 id="8">Next Steps and Support</h2>
-
-- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website.
-
-- For questions or support specific to MKL, visit the [Intel MKLDNN](https://github.com/intel/mkl-dnn) website.
-
-- If you find bugs, please open an issue on GitHub for [MXNet with MKL](https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](https://github.com/apache/incubator-mxnet/labels/MKLDNN).
+
+This limitations of this experimental feature are:
+
+- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable.
+
+- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. 
+
+- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN).
+
+<h2 id="7">Quantization and Inference with INT8</h2>
+
+Benefiting from Intel MKL-DNN, MXNet built with Intel MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel CPU Platform on Intel Xeon Scalable Platform.
+
+- [CNN Quantization Examples](https://github.com/apache/incubator-mxnet/tree/master/example/quantization).
+
+<h2 id="8">Next Steps and Support</h2>
+
+- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website.
+
+- For questions or support specific to MKL, visit the [Intel MKLDNN](https://github.com/intel/mkl-dnn) website.
+
+- If you find bugs, please open an issue on GitHub for [MXNet with MKL](https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](https://github.com/apache/incubator-mxnet/labels/MKLDNN).

From 15446a8deff4c0afdb3ac544f4d9ccd9247f314d Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 6 May 2019 14:50:22 +0800
Subject: [PATCH 03/24] enable fusion

---
 docs/tutorials/mkldnn/operator_list.md | 28 ++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 20d972d5a062..ef00337dcc91 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -50,16 +50,28 @@ MXNet MKL-DNN backend provides optimized implementations for various opertors co
 
 Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables.
 
+For example, you can enable all fusion passes by:
+
+```
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+```
+
+And disable `Convolution + Activation(ReLU)` fusion by:
+
+```
+export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
+```
+
 | Fusion pattern | Enable | Disable |
 | :--: | :--: | :--: |
-| Convolution + Activation(ReLU) |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
-| Convolution + elemwise_add |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM |
-| Convolution + BatchNorm |   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN |
-| Convolution + Activation(ReLu) + elemwise_add |   |   |
-| Convolution + BatchNorm + Activation(ReLu) + elemwise_add |   |   |
-| FullyConnected + Activation(ReLU) |   | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU |
-| Convolution (INT8) + re-quantization |   |   |
-| FullyConnected (INT8) + re-quantization |   |   |
+| Convolution + Activation(ReLU) | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + elemwise_add | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM |
+| Convolution + BatchNorm | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN |
+| Convolution + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |   |
+| Convolution + BatchNorm + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |   |
+| FullyConnected + Activation(ReLU) | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU |
+| Convolution (INT8) + re-quantization | MXNET_SUBGRAPH_BACKEND  |   |
+| FullyConnected (INT8) + re-quantization | MXNET_SUBGRAPH_BACKEND  |   |
 
 
 To try these features out, you can install MXNet MKL-DNN backend through pip:

From 70f372399246b967bef00467357634057b461020 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 6 May 2019 15:42:13 +0800
Subject: [PATCH 04/24] adjust table

---
 docs/tutorials/mkldnn/operator_list.md | 76 +++++++++++++-------------
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index ef00337dcc91..00a9638ee4f5 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -19,34 +19,32 @@
 
 MXNet MKL-DNN backend provides optimized implementations for various opertors covering a broad range of applications including image classification, object detection, natural language processing. We also provide the lower precision version for part of these operators on CPU leveraging the DL Boost technology from Intel. On computation graph level, a set of graph fusion pass and quantization pass is implemneted based on the sugraph feature of MXNet. To help users understanding MKL-DNN backend better, the tables below summarize the list of supported operators, data types and functionalities. As the community keeps working on more new features for MKL-DNN backend, the tables will be updated continuously.
 
-
-| Operator | Function | FP32 Training (backward) | FP32 Inference | INT8 Inference |
-| :--: | :--: | :--: | :--: | :--: |
-| **Convolution** | 1D Convolution | Y | Y | N |
-|  | 2D Convolution | Y | Y | Y |
-|  | 3D Convolution | Y | Y | N |
-| **Deconvolution** | 2D Deconvolution | Y | Y | N |
-|  | 3D Deconvolution | Y | Y | N |
-| **FullyConnected** | 1D-4D input,   flatten=Ture | N | Y | Y |
-|  | 1D-4D input,   flatten=False | N | Y | Y |
-| **Pooling** | 2D max Pooling | Y | Y | Y |
-|  | 2D avg pooling | Y | Y | Y |
-| **BatchNorm** | 2D BatchNorm | Y | Y | N |
-| **LRN** | 2D LRN | Y | Y | N |
-| **Activation** | ReLU | Y | Y | Y |
-|  | Tanh | Y | Y | N |
-|  | SoftReLU | Y | Y | N |
-|  | Sigmoid | Y | Y | N |
-| **softmax** | 1D-4D input | Y | Y | N |
-| **Softmax_output** | 1D-4D input | N | Y | N |
-| **Transpose** | 1D-4D input | N | Y | N |
-| **elemwise_add** | 1D-4D input | Y | Y | Y |
-| **Concat** | 1D-4D input | Y | Y | Y |
-| **slice** | 1D-4D input | N | Y | N |
-| **Quantization** | 1D-4D input | N | N | Y |
-| **Dequantization** | 1D-4D input | N | N | Y |
-| **Requantization** | 1D-4D input | N | N | Y |
-
+| Operator           | Function                   | FP32 Training (backward) | FP32 Inference | INT8 Inference |
+| :--:               | :--:                       | :--:                     | :--:           | :--:           |
+| **Convolution**    | 1D Convolution             | Y                        | Y              | N              |
+|                    | 2D Convolution             | Y                        | Y              | Y              |
+|                    | 3D Convolution             | Y                        | Y              | N              |
+| **Deconvolution**  | 2D Deconvolution           | Y                        | Y              | N              |
+|                    | 3D Deconvolution           | Y                        | Y              | N              |
+| **FullyConnected** | 1D-4D input, flatten=True  | N                        | Y              | Y              |
+|                    | 1D-4D input, flatten=False | N                        | Y              | Y              |
+| **Pooling**        | 2D max Pooling             | Y                        | Y              | Y              |
+|                    | 2D avg pooling             | Y                        | Y              | Y              |
+| **BatchNorm**      | 2D BatchNorm               | Y                        | Y              | N              |
+| **LRN**            | 2D LRN                     | Y                        | Y              | N              |
+| **Activation**     | ReLU                       | Y                        | Y              | Y              |
+|                    | Tanh                       | Y                        | Y              | N              |
+|                    | SoftReLU                   | Y                        | Y              | N              |
+|                    | Sigmoid                    | Y                        | Y              | N              |
+| **softmax**        | 1D-4D input                | Y                        | Y              | N              |
+| **Softmax_output** | 1D-4D input                | N                        | Y              | N              |
+| **Transpose**      | 1D-4D input                | N                        | Y              | N              |
+| **elemwise_add**   | 1D-4D input                | Y                        | Y              | Y              |
+| **Concat**         | 1D-4D input                | Y                        | Y              | Y              |
+| **slice**          | 1D-4D input                | N                        | Y              | N              |
+| **Quantization**   | 1D-4D input                | N                        | N              | Y              |
+| **Dequantization** | 1D-4D input                | N                        | N              | Y              |
+| **Requantization** | 1D-4D input                | N                        | N              | Y              |
 
 Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables.
 
@@ -62,22 +60,22 @@ And disable `Convolution + Activation(ReLU)` fusion by:
 export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 ```
 
-| Fusion pattern | Enable | Disable |
-| :--: | :--: | :--: |
-| Convolution + Activation(ReLU) | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
-| Convolution + elemwise_add | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM |
-| Convolution + BatchNorm | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN |
-| Convolution + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |   |
-| Convolution + BatchNorm + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |   |
-| FullyConnected + Activation(ReLU) | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU |
-| Convolution (INT8) + re-quantization | MXNET_SUBGRAPH_BACKEND  |   |
-| FullyConnected (INT8) + re-quantization | MXNET_SUBGRAPH_BACKEND  |   |
+| Fusion pattern                                            | Enable                  | Disable                             |
+| :--:                                                      | :--:                    | :--:                                |
+| Convolution + Activation(ReLU)                            | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + elemwise_add                                | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
+| Convolution + BatchNorm                                   | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
+| Convolution + Activation(ReLu) + elemwise_add             | MXNET_SUBGRAPH_BACKEND  |                                     |
+| Convolution + BatchNorm + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |                                     |
+| FullyConnected + Activation(ReLU)                         | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
+| Convolution (INT8) + re-quantization                      | MXNET_SUBGRAPH_BACKEND  |                                     |
+| FullyConnected (INT8) + re-quantization                   | MXNET_SUBGRAPH_BACKEND  |                                     |
 
 
 To try these features out, you can install MXNet MKL-DNN backend through pip:
 
 ```
-pip install mxnet-mkl
+pip install mxnet-mkl [--pre]
 ```
 
 To build MXNet MKL-DNN backend from source code, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)

From 5b3f6dbec5370c38f9b8a42c59176dc1917044f8 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 6 May 2019 23:55:47 +0800
Subject: [PATCH 05/24] fix comments

---
 docs/install/index.md                  |  2 +-
 docs/tutorials/mkldnn/MKLDNN_README.md |  2 +-
 docs/tutorials/mkldnn/operator_list.md | 32 +++++++++++---------------
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index cf56472b8e25..456587b58d15 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -187,7 +187,7 @@ $ pip install mxnet --pre
 
 </div> <!-- End of master-->
 <hr> <!-- pip footer -->
-MXNet offers MKL pip packages that will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in [performance on Intel CPU](https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu).
+MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
 
 ```
 $ pip install mxnet-mkl --pre
diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
index 18d26ab4e358..23617d1d627a 100644
--- a/docs/tutorials/mkldnn/MKLDNN_README.md
+++ b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -20,7 +20,7 @@
 A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
 In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
 
-Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](http://mxnet.incubator.apache.org/tutorials/mkldnn/operator_list.html)
+Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](http://mxnet.incubator.apache.org/tutorials/mkldnn/operator_list.html).
 
 The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
 
diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 00a9638ee4f5..8fcc72859276 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -60,24 +60,18 @@ And disable `Convolution + Activation(ReLU)` fusion by:
 export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 ```
 
-| Fusion pattern                                            | Enable                  | Disable                             |
-| :--:                                                      | :--:                    | :--:                                |
-| Convolution + Activation(ReLU)                            | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
-| Convolution + elemwise_add                                | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
-| Convolution + BatchNorm                                   | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
-| Convolution + Activation(ReLu) + elemwise_add             | MXNET_SUBGRAPH_BACKEND  |                                     |
-| Convolution + BatchNorm + Activation(ReLu) + elemwise_add | MXNET_SUBGRAPH_BACKEND  |                                     |
-| FullyConnected + Activation(ReLU)                         | MXNET_SUBGRAPH_BACKEND  | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
-| Convolution (INT8) + re-quantization                      | MXNET_SUBGRAPH_BACKEND  |                                     |
-| FullyConnected (INT8) + re-quantization                   | MXNET_SUBGRAPH_BACKEND  |                                     |
-
-
-To try these features out, you can install MXNet MKL-DNN backend through pip:
-
-```
-pip install mxnet-mkl [--pre]
-```
-
-To build MXNet MKL-DNN backend from source code, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)
+| Fusion pattern                                            | Disable                             |
+| :--:                                                      | :--:                                |
+| Convolution + Activation(ReLU)                            | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + elemwise_add                                | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
+| Convolution + BatchNorm                                   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
+| Convolution + Activation(ReLu) + elemwise_add             |                                     |
+| Convolution + BatchNorm + Activation(ReLu) + elemwise_add |                                     |
+| FullyConnected + Activation(ReLU)                         | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
+| Convolution (INT8) + re-quantization                      |                                     |
+| FullyConnected (INT8) + re-quantization                   |                                     |
+
+
+To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)
 
 For performance numbers, please refer to [performance on Intel CPU](https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu)

From 58e8ac1c9e1e8b9de525595c19c1dca55a809da4 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Tue, 7 May 2019 15:41:56 +0800
Subject: [PATCH 06/24] promote mxnet-mkl package

---
 docs/faq/perf.md      |  9 +++++++--
 docs/install/index.md | 23 +++++++++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/docs/faq/perf.md b/docs/faq/perf.md
index e1318b843a03..57472630e7e7 100644
--- a/docs/faq/perf.md
+++ b/docs/faq/perf.md
@@ -34,8 +34,13 @@ Performance is mainly affected by the following 4 factors:
 
 ## Intel CPU
 
-For using Intel Xeon CPUs for training and inference, we suggest enabling
-`USE_MKLDNN = 1` in `config.mk`. 
+For using Intel Xeon CPUs for training and inference, we suggest to install mxnet-mkl package by:
+
+```
+$ pip install mxnet-mkl [--pre]
+```
+
+Or build MXNet from source code with `USE_MKLDNN = 1`. For Linux users, `USE_MKLDNN = 1` will be turned on by default.
 
 We also find that setting the following environment variables can help:
 
diff --git a/docs/install/index.md b/docs/install/index.md
index 456587b58d15..ffc2436bc88c 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -124,6 +124,12 @@ Indicate your preferred configuration. Then, follow the customized commands to i
 $ pip install mxnet
 ```
 
+MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+
+```
+$ pip install mxnet-mkl==1.4.0
+```
+
 </div> <!-- End of v1-4-0 -->
 <div class="v1-3-1">
 
@@ -131,6 +137,12 @@ $ pip install mxnet
 $ pip install mxnet==1.3.1
 ```
 
+MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+
+```
+$ pip install mxnet-mkl==1.3.1
+```
+
 </div> <!-- End of v1-3-1 -->
 <div class="v1-2-1">
 
@@ -138,6 +150,12 @@ $ pip install mxnet==1.3.1
 $ pip install mxnet==1.2.1
 ```
 
+MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+
+```
+$ pip install mxnet-mkl==1.2.1
+```
+
 </div> <!-- End of v1-2-1 -->
 
 <div class="v1-1-0">
@@ -185,14 +203,15 @@ $ pip install mxnet==0.11.0
 $ pip install mxnet --pre
 ```
 
-</div> <!-- End of master-->
-<hr> <!-- pip footer -->
 MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
 
 ```
 $ pip install mxnet-mkl --pre
 ```
 
+</div> <!-- End of master-->
+<hr> <!-- pip footer -->
+
 Check the chart below for other options, refer to <a href="https://pypi.org/project/mxnet/">PyPI for other MXNet pip packages</a>, or <a href="validate_mxnet.html">validate your MXNet installation</a>.
 
 <img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/install/pip-packages-1.4.0.png" alt="pip packages"/>

From ce82caae2ad8dc11b7d696e4c810082b04708a61 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:47:09 +0800
Subject: [PATCH 07/24] Update docs/tutorials/mkldnn/MKLDNN_README.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/tutorials/mkldnn/MKLDNN_README.md | 56 +++++++++++++-------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
index 23617d1d627a..411f1e308eaa 100644
--- a/docs/tutorials/mkldnn/MKLDNN_README.md
+++ b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -1,27 +1,27 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
 # Build/Install MXNet with MKL-DNN
 
 A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
 In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
-
-Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](http://mxnet.incubator.apache.org/tutorials/mkldnn/operator_list.html).
-
+
+Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](../mkldnn/operator_list.md).
+
 The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
 
 
@@ -308,14 +308,14 @@ Graph optimization by subgraph feature are available in master branch. You can b
 ```
 export MXNET_SUBGRAPH_BACKEND=MKLDNN
 ```
-
-When `MKLDNN` backend is enabled, advanced control options are avaliable:
-
-```
-export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass
-export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
-```
-
+
+When `MKLDNN` backend is enabled, advanced control options are avaliable:
+
+```
+export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass
+export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
+```
+
 
 This limitations of this experimental feature are:
 

From dab9ddfb7cc6bed9a95a81743114197ad50cad81 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:47:43 +0800
Subject: [PATCH 08/24] Update docs/install/index.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/install/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index ffc2436bc88c..91e1bbb97ad6 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -203,7 +203,7 @@ $ pip install mxnet==0.11.0
 $ pip install mxnet --pre
 ```
 
-MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
 
 ```
 $ pip install mxnet-mkl --pre

From 2a5e2cd3689e0789efd861777c1cd16ed4946c0d Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:48:02 +0800
Subject: [PATCH 09/24] Update docs/install/index.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/install/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index 91e1bbb97ad6..2edb97d77fd5 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -150,7 +150,7 @@ $ pip install mxnet-mkl==1.3.1
 $ pip install mxnet==1.2.1
 ```
 
-MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
 
 ```
 $ pip install mxnet-mkl==1.2.1

From 708edee51a8d7a93a39af3746f748f16a2fce984 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:48:12 +0800
Subject: [PATCH 10/24] Update docs/install/index.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/install/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index 2edb97d77fd5..4b56f3567d32 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -137,7 +137,7 @@ $ pip install mxnet-mkl==1.4.0
 $ pip install mxnet==1.3.1
 ```
 
-MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
 
 ```
 $ pip install mxnet-mkl==1.3.1

From fb5fcc32849728a0867a20300e10204afbd43456 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:48:23 +0800
Subject: [PATCH 11/24] Update docs/install/index.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/install/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index 4b56f3567d32..ca22038e8562 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -124,7 +124,7 @@ Indicate your preferred configuration. Then, follow the customized commands to i
 $ pip install mxnet
 ```
 
-MXNet offers pip packages with MKL-DNN enabled which will be much faster when running on Intel hardware. Try the following command line to install it and find performance numbers and tuning guide in <a href="https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu">performance on Intel CPU</a>.
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../..//faq/perf.md#intel-cpu">MXNet tuning guide</a>.
 
 ```
 $ pip install mxnet-mkl==1.4.0

From 4a61c8ef5cd231b91bf947e162c058b574071d3c Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:49:09 +0800
Subject: [PATCH 12/24] Update docs/tutorials/mkldnn/operator_list.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/tutorials/mkldnn/operator_list.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 8fcc72859276..6536678e5a24 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -17,7 +17,9 @@
 
 # MKL-DNN Operator list
 
-MXNet MKL-DNN backend provides optimized implementations for various opertors covering a broad range of applications including image classification, object detection, natural language processing. We also provide the lower precision version for part of these operators on CPU leveraging the DL Boost technology from Intel. On computation graph level, a set of graph fusion pass and quantization pass is implemneted based on the sugraph feature of MXNet. To help users understanding MKL-DNN backend better, the tables below summarize the list of supported operators, data types and functionalities. As the community keeps working on more new features for MKL-DNN backend, the tables will be updated continuously.
+MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, natural language processing. 
+
+To help users understanding MKL-DNN backend better, the following table summarizes the list of supported operators, data types and functionalities.  A subset of operators support faster training and inference by using a lower precision version. Refer to the following table's `INT8 Inference` column to see which operators are supported.
 
 | Operator           | Function                   | FP32 Training (backward) | FP32 Inference | INT8 Inference |
 | :--:               | :--:                       | :--:                     | :--:           | :--:           |

From 4c897e836de06477fc5ca586186cdb0379c353d9 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:49:26 +0800
Subject: [PATCH 13/24] Update docs/faq/perf.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/faq/perf.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/faq/perf.md b/docs/faq/perf.md
index 57472630e7e7..60ea83e72004 100644
--- a/docs/faq/perf.md
+++ b/docs/faq/perf.md
@@ -34,7 +34,7 @@ Performance is mainly affected by the following 4 factors:
 
 ## Intel CPU
 
-For using Intel Xeon CPUs for training and inference, we suggest to install mxnet-mkl package by:
+When using Intel Xeon CPUs for training and inference, the `mxnet-mkl` package is recommended. Adding `--pre` installs a nightly build from master. Without it you will install the latest patched release of MXNet:
 
 ```
 $ pip install mxnet-mkl [--pre]

From eb364141fd41bd62154519f8417c7a8135cf344e Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:49:44 +0800
Subject: [PATCH 14/24] Update docs/faq/perf.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/faq/perf.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/faq/perf.md b/docs/faq/perf.md
index 60ea83e72004..62b40247081c 100644
--- a/docs/faq/perf.md
+++ b/docs/faq/perf.md
@@ -40,7 +40,7 @@ When using Intel Xeon CPUs for training and inference, the `mxnet-mkl` package i
 $ pip install mxnet-mkl [--pre]
 ```
 
-Or build MXNet from source code with `USE_MKLDNN = 1`. For Linux users, `USE_MKLDNN = 1` will be turned on by default.
+Or build MXNet from source code with `USE_MKLDNN=1`. For Linux users, `USE_MKLDNN=1` will be turned on by default.
 
 We also find that setting the following environment variables can help:
 

From bfc5ac042fe8d3af6bbd306341010e97f07042ff Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:50:05 +0800
Subject: [PATCH 15/24] Update docs/tutorials/mkldnn/operator_list.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/tutorials/mkldnn/operator_list.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 6536678e5a24..56153a9fb68c 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -76,4 +76,4 @@ export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 
 To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)
 
-For performance numbers, please refer to [performance on Intel CPU](https://mxnet.incubator.apache.org/versions/master/faq/perf.html#intel-cpu)
+For performance numbers, please refer to [performance on Intel CPU](../../faq/perf.md#intel-cpu)

From 7e53e8de0a35255180fd6b4b26dabcdb2c39f882 Mon Sep 17 00:00:00 2001
From: Aaron Markham <markhama@amazon.com>
Date: Wed, 8 May 2019 09:50:29 +0800
Subject: [PATCH 16/24] Update docs/tutorials/mkldnn/operator_list.md

Co-Authored-By: TaoLv <tao.a.lv@intel.com>
---
 docs/tutorials/mkldnn/operator_list.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 56153a9fb68c..8a0ee2bd0214 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -74,6 +74,6 @@ export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 | FullyConnected (INT8) + re-quantization                   |                                     |
 
 
-To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](http://mxnet.incubator.apache.org/tutorials/mkldnn/MKLDNN_README.html)
+To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](MKLDNN_README.md)
 
 For performance numbers, please refer to [performance on Intel CPU](../../faq/perf.md#intel-cpu)

From b74f3f7e90f6f599550b7bb9e1ae1b7bece56047 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Wed, 8 May 2019 09:59:28 +0800
Subject: [PATCH 17/24] fix markdown table

---
 docs/tutorials/mkldnn/operator_list.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 8a0ee2bd0214..66b6668382c8 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -22,7 +22,7 @@ MXNet MKL-DNN backend provides optimized implementations for various operators c
 To help users understanding MKL-DNN backend better, the following table summarizes the list of supported operators, data types and functionalities.  A subset of operators support faster training and inference by using a lower precision version. Refer to the following table's `INT8 Inference` column to see which operators are supported.
 
 | Operator           | Function                   | FP32 Training (backward) | FP32 Inference | INT8 Inference |
-| :--:               | :--:                       | :--:                     | :--:           | :--:           |
+| --                 | --                         | --                       | --             | --             |
 | **Convolution**    | 1D Convolution             | Y                        | Y              | N              |
 |                    | 2D Convolution             | Y                        | Y              | Y              |
 |                    | 3D Convolution             | Y                        | Y              | N              |
@@ -63,7 +63,7 @@ export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 ```
 
 | Fusion pattern                                            | Disable                             |
-| :--:                                                      | :--:                                |
+| --                                                        | --                                  |
 | Convolution + Activation(ReLU)                            | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
 | Convolution + elemwise_add                                | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
 | Convolution + BatchNorm                                   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |

From d1cf7439cb6dc63cd0ec61ebb883a231af32ac45 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Mon, 13 May 2019 15:51:56 +0800
Subject: [PATCH 18/24] fix comments

---
 docs/faq/env_var.md                    |  5 +++++
 docs/tutorials/mkldnn/operator_list.md | 15 ++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md
index c5ebd54c55a1..ae95abdee8c6 100644
--- a/docs/faq/env_var.md
+++ b/docs/faq/env_var.md
@@ -280,6 +280,11 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca
   - Values: Int ```(default=4)```
   - This variable controls how many CuDNN dropout state resources to create for each GPU context for use in operator.
 
+* MXNET_SUBGRAPH_BACKEND
+  - Values: String ```(default="")```
+  - This variable controls the subgraph partitioning in MXNet.
+  - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion pass.
+
 Settings for Minimum Memory Usage
 ---------------------------------
 - Make sure ```min(MXNET_EXEC_NUM_TEMP, MXNET_GPU_WORKER_NTHREADS) = 1```
diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 66b6668382c8..512b59ecb5ee 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -22,7 +22,7 @@ MXNet MKL-DNN backend provides optimized implementations for various operators c
 To help users understanding MKL-DNN backend better, the following table summarizes the list of supported operators, data types and functionalities.  A subset of operators support faster training and inference by using a lower precision version. Refer to the following table's `INT8 Inference` column to see which operators are supported.
 
 | Operator           | Function                   | FP32 Training (backward) | FP32 Inference | INT8 Inference |
-| --                 | --                         | --                       | --             | --             |
+| ---                | ---                        | ---                      | ---            | ---            |
 | **Convolution**    | 1D Convolution             | Y                        | Y              | N              |
 |                    | 2D Convolution             | Y                        | Y              | Y              |
 |                    | 3D Convolution             | Y                        | Y              | N              |
@@ -50,7 +50,7 @@ To help users understanding MKL-DNN backend better, the following table summariz
 
 Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables.
 
-For example, you can enable all fusion passes by:
+For example, you can enable all FP32 fusion passes in the following table by:
 
 ```
 export MXNET_SUBGRAPH_BACKEND=MKLDNN
@@ -62,8 +62,16 @@ And disable `Convolution + Activation(ReLU)` fusion by:
 export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 ```
 
+When generating the corresponding INT8 symbol, users can enable INT8 operator fusion passes as following:
+
+```
+# get qsym after model quantization
+qsym = qsym.get_backend_symbol('MKLDNN_POST_QUANTIZE')
+qsym.save(symbol_name) # fused INT8 operators will be save into the symbol JSON file
+```
+
 | Fusion pattern                                            | Disable                             |
-| --                                                        | --                                  |
+| ---                                                       | ---                                 |
 | Convolution + Activation(ReLU)                            | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
 | Convolution + elemwise_add                                | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
 | Convolution + BatchNorm                                   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
@@ -72,6 +80,7 @@ export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
 | FullyConnected + Activation(ReLU)                         | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
 | Convolution (INT8) + re-quantization                      |                                     |
 | FullyConnected (INT8) + re-quantization                   |                                     |
+| FullyConnected (INT8) + re-quantization + de-quantization |                                     |
 
 
 To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](MKLDNN_README.md)

From 5c3d067c62a76e2a3a4d71cf9f705a919647a22f Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Wed, 15 May 2019 14:04:06 +0800
Subject: [PATCH 19/24] Update docs/faq/env_var.md

Co-Authored-By: Aaron Markham <markhama@amazon.com>
---
 docs/faq/env_var.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md
index ae95abdee8c6..0d9241333947 100644
--- a/docs/faq/env_var.md
+++ b/docs/faq/env_var.md
@@ -283,7 +283,7 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca
 * MXNET_SUBGRAPH_BACKEND
   - Values: String ```(default="")```
   - This variable controls the subgraph partitioning in MXNet.
-  - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion pass.
+  - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to the [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion passes.
 
 Settings for Minimum Memory Usage
 ---------------------------------

From d9fcea4e4ba38449aee309d32c508b001d666064 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Wed, 15 May 2019 14:04:28 +0800
Subject: [PATCH 20/24] Update docs/install/index.md

Co-Authored-By: Aaron Markham <markhama@amazon.com>
---
 docs/install/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index ca22038e8562..ea93d40e0f8c 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -124,7 +124,7 @@ Indicate your preferred configuration. Then, follow the customized commands to i
 $ pip install mxnet
 ```
 
-MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../..//faq/perf.md#intel-cpu">MXNet tuning guide</a>.
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
 
 ```
 $ pip install mxnet-mkl==1.4.0

From b6de387ba221581ac7664a572770c6544995ea3c Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Wed, 15 May 2019 14:04:48 +0800
Subject: [PATCH 21/24] Update docs/tutorials/mkldnn/MKLDNN_README.md

Co-Authored-By: Aaron Markham <markhama@amazon.com>
---
 docs/tutorials/mkldnn/MKLDNN_README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
index 411f1e308eaa..2a7cd40ac291 100644
--- a/docs/tutorials/mkldnn/MKLDNN_README.md
+++ b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -20,7 +20,7 @@
 A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
 In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
 
-Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](../mkldnn/operator_list.md).
+Please find MKL-DNN optimized operators and other features in the [MKL-DNN operator list](../mkldnn/operator_list.md).
 
 The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
 

From b783f58856ebb1684c8223945587fa23647bdbe8 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Fri, 17 May 2019 17:40:21 +0800
Subject: [PATCH 22/24] change name of env variable

---
 docs/tutorials/mkldnn/operator_list.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md
index 512b59ecb5ee..4958f8d9b602 100644
--- a/docs/tutorials/mkldnn/operator_list.md
+++ b/docs/tutorials/mkldnn/operator_list.md
@@ -56,7 +56,7 @@ For example, you can enable all FP32 fusion passes in the following table by:
 export MXNET_SUBGRAPH_BACKEND=MKLDNN
 ```
 
-And disable `Convolution + Activation(ReLU)` fusion by:
+And disable `Convolution + Activation` fusion by:
 
 ```
 export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
@@ -66,17 +66,17 @@ When generating the corresponding INT8 symbol, users can enable INT8 operator fu
 
 ```
 # get qsym after model quantization
-qsym = qsym.get_backend_symbol('MKLDNN_POST_QUANTIZE')
+qsym = qsym.get_backend_symbol('MKLDNN_QUANTIZE')
 qsym.save(symbol_name) # fused INT8 operators will be save into the symbol JSON file
 ```
 
 | Fusion pattern                                            | Disable                             |
 | ---                                                       | ---                                 |
-| Convolution + Activation(ReLU)                            | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + Activation                                  | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
 | Convolution + elemwise_add                                | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
 | Convolution + BatchNorm                                   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
-| Convolution + Activation(ReLu) + elemwise_add             |                                     |
-| Convolution + BatchNorm + Activation(ReLu) + elemwise_add |                                     |
+| Convolution + Activation + elemwise_add                   |                                     |
+| Convolution + BatchNorm + Activation + elemwise_add       |                                     |
 | FullyConnected + Activation(ReLU)                         | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
 | Convolution (INT8) + re-quantization                      |                                     |
 | FullyConnected (INT8) + re-quantization                   |                                     |

From 0f82b3b110168cfd475d648eedb8127eacfb5450 Mon Sep 17 00:00:00 2001
From: Tao Lv <tao.a.lv@intel.com>
Date: Fri, 17 May 2019 20:52:36 +0800
Subject: [PATCH 23/24] retrigger ci


From 4d36bbff60656416a50290d40ce43371822d67b2 Mon Sep 17 00:00:00 2001
From: Sheng Zha <szha@users.noreply.github.com>
Date: Fri, 17 May 2019 18:57:41 -0700
Subject: [PATCH 24/24] Update env_var.md

---
 docs/faq/env_var.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md
index 07bd8123c0e8..ffde628d83a3 100644
--- a/docs/faq/env_var.md
+++ b/docs/faq/env_var.md
@@ -16,9 +16,7 @@
 <!--- under the License. -->
 
 Environment Variables
-
-
-================
+=====================
 MXNet has several settings that you can change with environment variables.
 Typically, you wouldn't need to change these settings, but they are listed here for reference.