[MKLDNN] Enable subgraph backend mkldnn by default. (#15518)

* Enable subgraph backend mkldnn by default * Fix lint * fix ut * fix scala test * Fix UT * Fix lint * Run CI * Run CI * Support MXNET_MKLDNN_ENABLED * Fix merge * Run CI
apache · Jul 26, 2019 · e98fea3 · e98fea3
1 parent b00bb81
commit e98fea3
Show file tree

Hide file tree

Showing 19 changed files with 504 additions and 291 deletions.
diff --git a/cpp-package/example/inference/README.md b/cpp-package/example/inference/README.md
@@ -41,7 +41,6 @@ The following performance numbers are collected via using C++ inference API on A
 ```
 export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
 export OMP_NUM_THREADS=$(vCPUs/2)
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 export MXNET_ENGINE_TYPE=NaiveEngine
 ```
 Also users are recommended to use ```numactl``` or ```taskset``` to bind a running process to the specified cores.
@@ -87,8 +86,6 @@ Follow the below steps to do inference with more models.
 
 The below command lines show how to run inference with FP32/INT8 resnet50_v1 model. Because the C++ inference script provides the almost same command line as this [Python script](https://github.com/apache/incubator-mxnet/blob/master/example/quantization/imagenet_inference.py) and then users can easily go from Python to C++.
 ```
-# set MKLDNN as subgraph backend
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # FP32 inference
 ./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --params_file "./model/resnet50_v1-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500

diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md
@@ -307,9 +307,10 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
   - This variable controls how many CuDNN dropout state resources to create for each GPU context for use in operator.
 
 * MXNET_SUBGRAPH_BACKEND
-  - Values: String ```(default="")```
+  - Values: String ```(default="MKLDNN")``` if MKLDNN is avaliable, otherwise ```(default="")```
   - This variable controls the subgraph partitioning in MXNet.
   - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to the [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion passes.
+  - Set ```MXNET_SUBGRAPH_BACKEND=NONE``` to disable subgraph backend.
 
 * MXNET_SAFE_ACCUMULATION
   - Values: Values: 0(false) or 1(true) ```(default=0)```

diff --git a/docs/tutorials/c++/subgraphAPI.md b/docs/tutorials/c++/subgraphAPI.md
@@ -111,7 +111,15 @@ There're 2 built-in attributes that used by MXNet executor.
 
 `inference_only` : bool, apply this property only for inference. Property will be skiped when need_grad=True. Default `false` if this attribute isn't defined.
 
-After defining the subgraph property, we need to register it in .cc file.
+After defining the subgraph property, we need to register it under a backend in .cc file.
+
+Firstly, we need to register the backend
+
+```C++
+MXNET_REGISTER_SUBGRAPH_BACKEND(SgTest);
+```
+
+Then register the property under it.
 
 ```C++
 MXNET_REGISTER_SUBGRAPH_PROPERTY(SgTest, SgProperty);
@@ -124,6 +132,7 @@ It's possible to register multiple properties for same backend. In practice, we
 #include "SgProperty2.h" // Define SgProperty2 class
 #include "SgProperty3.h" // Define SgProperty3 class
 
+MXNET_REGISTER_SUBGRAPH_BACKEND(SgTest);
 MXNET_REGISTER_SUBGRAPH_PROPERTY(SgTest, SgProperty);  // Execution order 1.
 MXNET_REGISTER_SUBGRAPH_PROPERTY(SgTest, SgProperty2); // Execution order 2.
 MXNET_REGISTER_SUBGRAPH_PROPERTY(SgTest, SgProperty3); // Execution order 3.

diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md
@@ -103,7 +103,7 @@ LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew -
 <h2 id="3">Windows</h2>
 
 On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN.
-[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.  
+[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended.
 
 **Visual Studio 2015**
 
@@ -113,8 +113,8 @@ To build and install MXNet yourself, you need the following dependencies. Instal
 2. Download and Install [CMake 3](https://cmake.org/files/v3.14/cmake-3.14.0-win64-x64.msi) if it is not already installed.
 3. Download [OpenCV 3](https://sourceforge.net/projects/opencvlibrary/files/3.4.5/opencv-3.4.5-vc14_vc15.exe/download), and unzip the OpenCV package, set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g.,```OpenCV_DIR = C:\opencv\build ```). Also, add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable.
 4. If you have Intel Math Kernel Library (Intel MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in ```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl```.
-5. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/), or build the latest version of OpenBLAS from source. Note that you should also download ```mingw64.dll.zip``` along with openBLAS and add them to PATH. 
-6. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Downloads\OpenBLAS\```. 
+5. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/), or build the latest version of OpenBLAS from source. Note that you should also download ```mingw64.dll.zip``` along with openBLAS and add them to PATH.
+6. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Downloads\OpenBLAS\```.
 
 After you have installed all of the required dependencies, build the MXNet source code:
 
@@ -123,17 +123,17 @@ After you have installed all of the required dependencies, build the MXNet sourc
 git clone --recursive https://github.com/apache/incubator-mxnet.git
 cd C:\incubator-mxent
 ```
-2. Enable Intel MKL-DNN by -DUSE_MKLDNN=1. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build```. Make sure to specify the architecture in the 
+2. Enable Intel MKL-DNN by -DUSE_MKLDNN=1. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build```. Make sure to specify the architecture in the
 command:
 ```
 >mkdir build
 >cd build
 >cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release
 ```
-3. Enable Intel MKL-DNN and Intel MKL as BLAS library by the command:  
+3. Enable Intel MKL-DNN and Intel MKL as BLAS library by the command:
 ```
 >"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\bin\mklvars.bat" intel64
->cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=mkl -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -DMKL_ROOT="C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl" 
+>cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=mkl -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -DMKL_ROOT="C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl"
 ```
 4. After the CMake successfully completed, in Visual Studio, open the solution file ```.sln``` and compile it, or compile the MXNet source code by using following command:
 ```r
@@ -154,7 +154,7 @@ User can follow the same steps of Visual Studio 2015 to build MXNET with MKL-DNN
 
 <h2 id="4">Verify MXNet with python</h2>
 
-Preinstall python and some dependent modules: 
+Preinstall python and some dependent modules:
 ```
 pip install numpy graphviz
 set PYTHONPATH=[workdir]\incubator-mxnet\python
@@ -261,7 +261,7 @@ MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1
 
 <h2 id="6">Enable graph optimization</h2>
 
-Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance:
+Graph optimization with subgraph is available and enabled by default in master branch. For MXNet release v1.5, you can manually enable it by:
 
 ```
 export MXNET_SUBGRAPH_BACKEND=MKLDNN
@@ -271,7 +271,7 @@ This limitations of this experimental feature are:
 
 - Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable.
 
-- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. 
+- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet.
 
 
 <h2 id="7">Quantization and Inference with INT8</h2>

diff --git a/example/quantization/README.md b/example/quantization/README.md
@@ -50,10 +50,7 @@ python imagenet_gen_qsym_mkldnn.py --model=resnet50_v1 --num-calib-batches=5 --c
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
-
-# Launch FP32 Inference 
+# Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json --param-file=./model/resnet50_v1-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
 
 # Launch INT8 Inference
@@ -74,8 +71,6 @@ python imagenet_gen_qsym_mkldnn.py --model=squeezenet1.0 --num-calib-batches=5 -
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/squeezenet1.0-symbol.json --param-file=./model/squeezenet1.0-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
@@ -98,8 +93,6 @@ python imagenet_gen_qsym_mkldnn.py --model=mobilenet1.0 --num-calib-batches=5 --
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/mobilenet1.0-symbol.json --param-file=./model/mobilenet1.0-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
@@ -122,8 +115,6 @@ python imagenet_gen_qsym_mkldnn.py --model=mobilenetv2_1.0 --num-calib-batches=5
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/mobilenetv2_1.0-symbol.json --param-file=./model/mobilenetv2_1.0-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
@@ -146,8 +137,6 @@ python imagenet_gen_qsym_mkldnn.py --model=inceptionv3 --image-shape=3,299,299 -
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/inceptionv3-symbol.json --param-file=./model/inceptionv3-0000.params --image-shape=3,299,299 --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
@@ -171,10 +160,8 @@ python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-resnet-152 --num-calib-bat
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
-# Launch FP32 Inference 
+# Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/imagenet1k-resnet-152-symbol.json --param-file=./model/imagenet1k-resnet-152-0000.params --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
 
 # Launch INT8 Inference
@@ -196,10 +183,8 @@ python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-inception-bn --num-calib-b
 The model would be automatically replaced in fusion and quantization format. It is then saved as the quantized symbol and parameter files in the `./model` directory. The following command is to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
-# Launch FP32 Inference 
+# Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-symbol.json --param-file=./model/imagenet1k-inception-bn-0000.params --rgb-mean=123.68,116.779,103.939 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu
 
 # Launch INT8 Inference
@@ -240,10 +225,8 @@ Some tips on quantization configs:
 2. Then, you should run the following command and verify that your fp32 symbolic model runs inference as expected.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
-# Launch FP32 Inference 
+# Launch FP32 Inference
 python imagenet_inference.py --symbol-file=./model/custom-symbol.json --param-file=./model/custom-0000.params --rgb-mean=* --rgb-std=* --num-skipped-batches=* --batch-size=* --num-inference-batches=*--dataset=./data/* --ctx=cpu
 ```
 
@@ -260,7 +243,7 @@ python imagenet_gen_qsym_mkldnn.py --model=custom --num-calib-batches=5 --calib-
 6. Finally, you can run INT8 inference:
 
 ```
-# Launch INT8 Inference 
+# Launch INT8 Inference
 python imagenet_inference.py --symbol-file=./model/*.json --param-file=./model/*.params --rgb-mean=* --rgb-std=* --num-skipped-batches=* --batch-size=* --num-inference-batches=*--dataset=./data/* --ctx=cpu
 
 # Launch dummy data Inference
@@ -289,6 +272,6 @@ the console to run model quantization for a specific configuration.
 - `launch_inference.sh` This is a shell script that calculate the accuracies of all the quantized models generated
 by invoking `launch_quantize.sh`.
 
-**NOTE**: 
+**NOTE**:
 - This example has only been tested on Linux systems.
 - Performance is expected to decrease with GPU, however the memory footprint of a quantized model is smaller. The purpose of the quantization implementation is to minimize accuracy loss when converting FP32 models to INT8. MXNet community is working on improving the performance.
diff --git a/example/ssd/README.md b/example/ssd/README.md
@@ -234,8 +234,6 @@ python quantization.py
 After quantization, INT8 models will be saved in `model/` dictionary.  Use the following command to launch inference.
 
 ```
-# USE MKLDNN AS SUBGRAPH BACKEND
-export MXNET_SUBGRAPH_BACKEND=MKLDNN
 
 # Launch FP32 Inference on VOC dataset
 python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy --prefix=./model/ssd_

diff --git a/scala-package/core/src/test/scala/org/apache/mxnet/OperatorSuite.scala b/scala-package/core/src/test/scala/org/apache/mxnet/OperatorSuite.scala
@@ -694,8 +694,8 @@ class OperatorSuite extends FunSuite with BeforeAndAfterAll
   }
 
   test("maximum") {
-    val data1 = Symbol.Variable("data")
-    val data2 = Symbol.Variable("data")
+    val data1 = Symbol.Variable("data1")
+    val data2 = Symbol.Variable("data2")
     val shape = Shape(3, 4)
     val dataTmp1 = Random.uniform(0, 100, shape)
     val dataTmp2 = Random.uniform(0, 100, shape)
@@ -712,8 +712,8 @@ class OperatorSuite extends FunSuite with BeforeAndAfterAll
   }
 
   test("minimum") {
-    val data1 = Symbol.Variable("data")
-    val data2 = Symbol.Variable("data")
+    val data1 = Symbol.Variable("data1")
+    val data2 = Symbol.Variable("data2")
     val shape = Shape(3, 4)
     val dataTmp1 = Random.uniform(0, 100, shape)
     val dataTmp2 = Random.uniform(0, 100, shape)

diff --git a/src/c_api/c_api_symbolic.cc b/src/c_api/c_api_symbolic.cc
@@ -1035,15 +1035,15 @@ int MXSetCalibTableToQuantizedSymbol(SymbolHandle qsym_handle,
   API_END_HANDLE_ERROR(delete s);
 }
 
-int MXGenBackendSubgraph(SymbolHandle sym_handle, const char *backend,
+int MXGenBackendSubgraph(SymbolHandle sym_handle, const char *backend_name,
                          SymbolHandle *ret_sym_handle) {
   nnvm::Symbol *s = new nnvm::Symbol();
   API_BEGIN();
   nnvm::Symbol *sym = static_cast<nnvm::Symbol *>(sym_handle);
   *s = sym->Copy();
-  std::vector<mxnet::op::SubgraphPropertyPtr> properties =
-      mxnet::op::SubgraphPropertyRegistry::Get()->CreateSubgraphProperty(backend);
-  for (auto property : properties) {
+  auto backend = mxnet::op::SubgraphBackendRegistry::Get()->GetSubgraphBackend(backend_name);
+  const auto& subgraph_prop_list = backend->GetSubgraphProperties();
+  for (auto property : subgraph_prop_list) {
     nnvm::Graph g = Symbol2Graph(*s);
     property->SetAttr("graph", g);
     g.attrs["subgraph_property"] = std::make_shared<nnvm::any>(std::move(property));

diff --git a/src/c_api/c_api_test.cc b/src/c_api/c_api_test.cc
@@ -41,9 +41,11 @@ int MXBuildSubgraphByOpNames(SymbolHandle sym_handle,
   nnvm::Symbol* sym = static_cast<nnvm::Symbol*>(sym_handle);
   *s = sym->Copy();
   if (!op_name_set.empty()) {
-    std::vector<mxnet::op::SubgraphPropertyPtr> properties =
-        mxnet::op::SubgraphPropertyRegistry::Get()->CreateSubgraphProperty(prop_name);
-    for (auto property : properties) {
+    auto& backend =
+        mxnet::op::SubgraphBackendRegistry::Get()->GetSubgraphBackend(prop_name);
+    LOG(INFO) << "Subgraph backend " << backend->GetName() << " is activated.";
+    const auto& subgraph_prop_list = backend->GetSubgraphProperties();
+    for (auto property : subgraph_prop_list) {
       nnvm::Graph g;
       g.outputs = s->outputs;
       property->SetAttr("graph", g);