apache · pengzhao-intel · May 18, 2019 · May 6, 2019 · May 6, 2019 · May 6, 2019
@@ -280,6 +280,11 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca
   - Values: Int ```(default=4)```
   - This variable controls how many CuDNN dropout state resources to create for each GPU context for use in operator.
 
+* MXNET_SUBGRAPH_BACKEND
+  - Values: String ```(default="")```
+  - This variable controls the subgraph partitioning in MXNet.
+  - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion pass.
+
 Settings for Minimum Memory Usage
 ---------------------------------
 - Make sure ```min(MXNET_EXEC_NUM_TEMP, MXNET_GPU_WORKER_NTHREADS) = 1```

@@ -34,8 +34,13 @@ Performance is mainly affected by the following 4 factors:
 
 ## Intel CPU
 
-For using Intel Xeon CPUs for training and inference, we suggest enabling
-`USE_MKLDNN = 1` in `config.mk`. 
+When using Intel Xeon CPUs for training and inference, the `mxnet-mkl` package is recommended. Adding `--pre` installs a nightly build from master. Without it you will install the latest patched release of MXNet:
+
+```
+$ pip install mxnet-mkl [--pre]
+```
+
+Or build MXNet from source code with `USE_MKLDNN=1`. For Linux users, `USE_MKLDNN=1` will be turned on by default.
 
 We also find that setting the following environment variables can help:
 

@@ -124,20 +124,38 @@ Indicate your preferred configuration. Then, follow the customized commands to i
 $ pip install mxnet
 ```
 
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../..//faq/perf.md#intel-cpu">MXNet tuning guide</a>.
+
+```
+$ pip install mxnet-mkl==1.4.0
+```
+
 </div> <!-- End of v1-4-0 -->
 <div class="v1-3-1">
 
 ```
 $ pip install mxnet==1.3.1
 ```
 
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
+
+```
+$ pip install mxnet-mkl==1.3.1
+```
+
 </div> <!-- End of v1-3-1 -->
 <div class="v1-2-1">
 
 ```
 $ pip install mxnet==1.2.1
 ```
 
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
+
+```
+$ pip install mxnet-mkl==1.2.1
+```
+
 </div> <!-- End of v1-2-1 -->
 
 <div class="v1-1-0">
@@ -185,9 +203,15 @@ $ pip install mxnet==0.11.0
 $ pip install mxnet --pre
 ```
 
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the <a href="../../faq/perf.md#intel-cpu">MXNet tuning guide</a>.
+
+```
+$ pip install mxnet-mkl --pre
+```
+
 </div> <!-- End of master-->
 <hr> <!-- pip footer -->
-MXNet offers MKL pip packages that will be much faster when running on Intel hardware.
+
 Check the chart below for other options, refer to <a href="https://pypi.org/project/mxnet/">PyPI for other MXNet pip packages</a>, or <a href="validate_mxnet.html">validate your MXNet installation</a>.
 
 <img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/install/pip-packages-1.4.0.png" alt="pip packages"/>

@@ -1,25 +1,27 @@
-<!--- Licensed to the Apache Software Foundation (ASF) under one -->
-<!--- or more contributor license agreements.  See the NOTICE file -->
-<!--- distributed with this work for additional information -->
-<!--- regarding copyright ownership.  The ASF licenses this file -->
-<!--- to you under the Apache License, Version 2.0 (the -->
-<!--- "License"); you may not use this file except in compliance -->
-<!--- with the License.  You may obtain a copy of the License at -->
-
-<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
-
-<!--- Unless required by applicable law or agreed to in writing, -->
-<!--- software distributed under the License is distributed on an -->
-<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
-<!--- KIND, either express or implied.  See the License for the -->
-<!--- specific language governing permissions and limitations -->
-<!--- under the License. -->
-
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
 # Build/Install MXNet with MKL-DNN
 
 A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS.
 In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows.
 
+Please find MKL-DNN optimized operators and other features in [MKL-DNN operator list](../mkldnn/operator_list.md).
+
 The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu).
 
 
@@ -306,14 +308,14 @@ Graph optimization by subgraph feature are available in master branch. You can b
 ```
 export MXNET_SUBGRAPH_BACKEND=MKLDNN
 ```
-
-When `MKLDNN` backend is enabled, advanced control options are avaliable:
-
-```
-export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass
-export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
-```
-
+
+When `MKLDNN` backend is enabled, advanced control options are avaliable:
+
+```
+export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass
+export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass
+```
+
 
 This limitations of this experimental feature are:
 

@@ -0,0 +1,88 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# MKL-DNN Operator list
+
+MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, natural language processing. 
+
+To help users understanding MKL-DNN backend better, the following table summarizes the list of supported operators, data types and functionalities.  A subset of operators support faster training and inference by using a lower precision version. Refer to the following table's `INT8 Inference` column to see which operators are supported.
+
+| Operator           | Function                   | FP32 Training (backward) | FP32 Inference | INT8 Inference |
+| ---                | ---                        | ---                      | ---            | ---            |
+| **Convolution**    | 1D Convolution             | Y                        | Y              | N              |
+|                    | 2D Convolution             | Y                        | Y              | Y              |
+|                    | 3D Convolution             | Y                        | Y              | N              |
+| **Deconvolution**  | 2D Deconvolution           | Y                        | Y              | N              |
+|                    | 3D Deconvolution           | Y                        | Y              | N              |
+| **FullyConnected** | 1D-4D input, flatten=True  | N                        | Y              | Y              |
+|                    | 1D-4D input, flatten=False | N                        | Y              | Y              |
+| **Pooling**        | 2D max Pooling             | Y                        | Y              | Y              |
+|                    | 2D avg pooling             | Y                        | Y              | Y              |
+| **BatchNorm**      | 2D BatchNorm               | Y                        | Y              | N              |
+| **LRN**            | 2D LRN                     | Y                        | Y              | N              |
+| **Activation**     | ReLU                       | Y                        | Y              | Y              |
+|                    | Tanh                       | Y                        | Y              | N              |
+|                    | SoftReLU                   | Y                        | Y              | N              |
+|                    | Sigmoid                    | Y                        | Y              | N              |
+| **softmax**        | 1D-4D input                | Y                        | Y              | N              |
+| **Softmax_output** | 1D-4D input                | N                        | Y              | N              |
+| **Transpose**      | 1D-4D input                | N                        | Y              | N              |
+| **elemwise_add**   | 1D-4D input                | Y                        | Y              | Y              |
+| **Concat**         | 1D-4D input                | Y                        | Y              | Y              |
+| **slice**          | 1D-4D input                | N                        | Y              | N              |
+| **Quantization**   | 1D-4D input                | N                        | N              | Y              |
+| **Dequantization** | 1D-4D input                | N                        | N              | Y              |
+| **Requantization** | 1D-4D input                | N                        | N              | Y              |
+
+Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables.
+
+For example, you can enable all FP32 fusion passes in the following table by:
+
+```
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+```
+
+And disable `Convolution + Activation(ReLU)` fusion by:
+
+```
+export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1
+```
+
+When generating the corresponding INT8 symbol, users can enable INT8 operator fusion passes as following:
+
+```
+# get qsym after model quantization
+qsym = qsym.get_backend_symbol('MKLDNN_POST_QUANTIZE')
+qsym.save(symbol_name) # fused INT8 operators will be save into the symbol JSON file
+```
+
+| Fusion pattern                                            | Disable                             |
+| ---                                                       | ---                                 |
+| Convolution + Activation(ReLU)                            | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |
+| Convolution + elemwise_add                                | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |
+| Convolution + BatchNorm                                   | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN   |
+| Convolution + Activation(ReLu) + elemwise_add             |                                     |
+| Convolution + BatchNorm + Activation(ReLu) + elemwise_add |                                     |
+| FullyConnected + Activation(ReLU)                         | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU   |
+| Convolution (INT8) + re-quantization                      |                                     |
+| FullyConnected (INT8) + re-quantization                   |                                     |
+| FullyConnected (INT8) + re-quantization + de-quantization |                                     |
+
+
+To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](MKLDNN_README.md)
+
+For performance numbers, please refer to [performance on Intel CPU](../../faq/perf.md#intel-cpu)
diff --git a/tests/tutorials/test_sanity_tutorials.py b/tests/tutorials/test_sanity_tutorials.py
@@ -35,6 +35,7 @@
              'gluon/index.md',
              'mkldnn/index.md',
              'mkldnn/MKLDNN_README.md',
+             'mkldnn/operator_list.md',
              'nlp/index.md',
              'onnx/index.md',
              'python/index.md',