apache · szha · Feb 5, 2022 · Dec 14, 2021 · Dec 14, 2021 · Dec 15, 2021
@@ -67,6 +67,11 @@ Contributed modules
 
       Functions for manipulating text data.
 
+   .. card::
+      :title: contrib.quantization
+      :link: quantization/index.html
+
+      Functions for precision reduction.
 
 .. toctree::
    :hidden:

@@ -0,0 +1,23 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+contrib.quantization
+====================
+
+.. automodule:: mxnet.contrib.quantization
+    :members:
+    :autosummary:
@@ -432,6 +432,67 @@ A new module called `mxnet.gluon.probability` has been introduced in Gluon 2.0.
 
 3. [Transformation](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon/probability/transformation): implement invertible transformation with computable log det jacobians.
 
+##  oneDNN Integration
+### Operator Fusion
+In versions 1.x of MXNet pattern fusion in execution graph was enabled by default when using MXNet built with oneDNN library support and could have been disabled by setting 'MXNET_SUBGRAPH_BACKEND' environment flag to `None`. MXNet 2.0 introduced changes in forward inference flow which led to refactor of fusion mechanism. To fuse model in MXNet 2.0 there are two requirements:
+
+ - the model must be defined as a subclass of HybridBlock or Symbol
+
+ - the model must have specific operator patterns which can be fused
+
+Both HybridBlock and Symbol classes provide API to easily run fusion of operators. All we have to do is to add single line of code running fusion passes on our model:
+```{.python}
+# on HybridBlock
+net.optimize_for(data, backend='ONEDNN')
+# on Symbol
+optimized_symbol = sym.optimize_for(backend='ONEDNN')
+```
+
+Controling which patterns should be fused still can be done by setting proper environment variables. See [**oneDNN Environment Variables**](#oneDNN-Environment-Variables)
+
+### INT8 Quantization / Precision reduction
+Quantization API was also refactored to be consistent with other new features and mechanisms. In comparison to MXNet 1.x releases, in MXNet 2.0 `quantize_net_v2` function has been removed and development focused mainly on `quantize_net` function to make it easier to use for end user and ultimately give him more flexibility.
+Quantization can be performed on either subclass of HybridBlock with `quantize_net` or Symbol with deprecated `quantize_model` (`quantize_model` is left only to provide backward compatibility and its usage is strongly discouraged).
+
+```{.python}
+import mxnet as mx
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.model_zoo.vision import resnet50_v1
+
+# load model
+net = resnet50_v1(pretrained=True)
+
+# prepare calibration data
+dummy_data = mx.nd.random.uniform(-1.0, 1.0, (batch_size, 3, 224, 224))
+calib_data_loader = mx.gluon.data.DataLoader(dummy_data, batch_size=batch_size)
+
+# quantization
+qnet = quantize_net(net, calib_mode='naive', calib_data=calib_data_loader)
+```
+`quantize_net` can be much more complex - all function attributes can be found in the [API](../../api/contrib/quantization/index.rst).
+
+### oneDNN Environment Variables
+In version 2.0 of MXNet all references to MKLDNN (former name of oneDNN) were replaced by ONEDNN. Below table lists all environment variables:
+
+|               MXNet 1.x              |                MXNet 2.0               |
+| ------------------------------------ | ---------------------------------------|
+|         MXNET_MKLDNN_ENABLED         |          MXNET_ONEDNN_ENABLED          |
+|         MXNET_MKLDNN_CACHE_NUM       |         MXNET_ONEDNN_CACHE_NUM         |
+|    MXNET_MKLDNN_FORCE_FC_AB_FORMAT   |     MXNET_ONEDNN_FORCE_FC_AB_FORMAT    |
+|         MXNET_MKLDNN_ENABLED         |          MXNET_ONEDNN_ENABLED          |
+|         MXNET_MKLDNN_DEBUG           |           MXNET_ONEDNN_DEBUG           |
+|         MXNET_USE_MKLDNN_RNN         |          MXNET_USE_ONEDNN_RNN          |
+|     MXNET_DISABLE_MKLDNN_CONV_OPT    |      MXNET_DISABLE_ONEDNN_CONV_OPT     |
+|    MXNET_DISABLE_MKLDNN_FUSE_CONV_BN |    MXNET_DISABLE_ONEDNN_FUSE_CONV_BN   |
+|  MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU |   MXNET_DISABLE_ONEDNN_FUSE_CONV_RELU  |
+|  MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM  |   MXNET_DISABLE_ONEDNN_FUSE_CONV_SUM   |
+|      MXNET_DISABLE_MKLDNN_FC_OPT     |       MXNET_DISABLE_ONEDNN_FC_OPT      |
+| MXNET_DISABLE_MKLDNN_FUSE_FC_ELTWISE |  MXNET_DISABLE_ONEDNN_FUSE_FC_ELTWISE  |
+| MXNET_DISABLE_MKLDNN_TRANSFORMER_OPT |  MXNET_DISABLE_ONEDNN_TRANSFORMER_OPT  |
+|                  n/a                 |   MXNET_DISABLE_ONEDNN_BATCH_DOT_FUSE  |
+|                  n/a                 |      MXNET_ONEDNN_FUSE_REQUANTIZE      |
+|                  n/a                 |      MXNET_ONEDNN_FUSE_DEQUANTIZE      |
+
 ## Appendix
 ### NumPy Array Deprecated Attributes
 |                   Deprecated Attributes               |    NumPy ndarray Equivalent    |

@@ -46,7 +46,7 @@ def _quantize_params(qsym, params, min_max_dict):
     qsym : Symbol
         Quantized symbol from FP32 symbol.
     params : dict of str->NDArray
-    min_max_dict: dict of min/max pairs of layers' output
+    min_max_dict : dict of min/max pairs of layers' output
     """
     inputs_name = qsym.list_arguments()
     quantized_params = {}
@@ -110,11 +110,11 @@ def _quantize_symbol(sym, device, excluded_symbols=None, excluded_operators=None
         Names of the parameters that users want to quantize offline. It's always recommended to
         quantize parameters offline so that quantizing parameters during the inference can be
         avoided.
-    quantized_dtype: str
+    quantized_dtype : str
         The quantized destination type for input data.
-    quantize_mode: str
+    quantize_mode : str
         The mode that quantization pass to apply.
-    quantize_granularity: str
+    quantize_granularity : str
         The granularity of quantization, currently supports 'tensor-wise' and 'channel-wise'
         quantization. The default value is 'tensor-wise'.
     """
@@ -174,15 +174,16 @@ def __init__(self):
     def collect(self, name, op_name, arr):
         """Function which is registered to Block as monitor callback. Names of layers
         requiring calibration are stored in `self.include_layers` variable.
-            Parameters
-            ----------
-            name : str
-                Node name from which collected data comes from
-            op_name : str
-                Operator name from which collected data comes from. Single operator
-                can have multiple inputs/ouputs nodes - each should have different name
-            arr : NDArray
-                NDArray containing data of monitored node
+
+        Parameters
+        ----------
+        name : str
+            Node name from which collected data comes from
+        op_name : str
+            Operator name from which collected data comes from. Single operator
+            can have multiple inputs/ouputs nodes - each should have different name
+        arr : NDArray
+            NDArray containing data of monitored node
         """
 
     def post_collect(self):
@@ -227,8 +228,7 @@ def post_collect(self):
 
     @staticmethod
     def combine_histogram(old_hist, arr, new_min, new_max, new_th):
-        """ Collect layer histogram for arr and combine it with old histogram.
-        """
+        """Collect layer histogram for arr and combine it with old histogram."""
         (old_hist, old_hist_edges, old_min, old_max, old_th) = old_hist
         if new_th <= old_th:
             hist, _ = np.histogram(arr, bins=len(old_hist), range=(-old_th, old_th))
@@ -392,21 +392,22 @@ def quantize_model(sym, arg_params, aux_params, data_names=('data',),
     The backend quantized operators are only enabled for Linux systems. Please do not run
     inference using the quantized models on Windows for now.
     The quantization implementation adopts the TensorFlow's approach:
-    https://www.tensorflow.org/performance/quantization.
+    https://www.tensorflow.org/lite/performance/post_training_quantization.
     The calibration implementation borrows the idea of Nvidia's 8-bit Inference with TensorRT:
     http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
     and adapts the method to MXNet.
 
     .. _`quantize_model_params`:
+
     Parameters
     ----------
-    sym : str or Symbol
+    sym : Symbol
         Defines the structure of a neural network for FP32 data types.
     arg_params : dict
         Dictionary of name to `NDArray`.
     aux_params : dict
         Dictionary of name to `NDArray`.
-    data_names : a list of strs
+    data_names : list of strings
         Data names required for creating a Module object to run forward propagation on the
         calibration dataset.
     device : Device
@@ -441,15 +442,15 @@ def quantize_model(sym, arg_params, aux_params, data_names=('data',),
         The mode that quantization pass to apply. Support 'full' and 'smart'.
         'full' means quantize all operator if possible.
         'smart' means quantization pass will smartly choice which operator should be quantized.
-    quantize_granularity: str
+    quantize_granularity : str
         The granularity of quantization, currently supports 'tensor-wise' and 'channel-wise'
         quantization. The default value is 'tensor-wise'.
     logger : Object
         A logging object for printing information during the process of quantization.
 
     Returns
     -------
-    quantized_model: tuple
+    quantized_model : tuple
         A tuple of quantized symbol, quantized arg_params, and aux_params.
     """
     warnings.warn('WARNING: This will be deprecated please use quantize_net with Gluon models')
@@ -582,9 +583,10 @@ def quantize_graph(sym, arg_params, aux_params, device=cpu(),
     and a collector for naive or entropy calibration.
     The backend quantized operators are only enabled for Linux systems. Please do not run
     inference using the quantized models on Windows for now.
+
     Parameters
     ----------
-    sym : str or Symbol
+    sym : Symbol
         Defines the structure of a neural network for FP32 data types.
     device : Device
         Defines the device that users want to run forward propagation on the calibration
@@ -616,7 +618,7 @@ def quantize_graph(sym, arg_params, aux_params, device=cpu(),
         The mode that quantization pass to apply. Support 'full' and 'smart'.
         'full' means quantize all operator if possible.
         'smart' means quantization pass will smartly choice which operator should be quantized.
-    quantize_granularity: str
+    quantize_granularity : str
         The granularity of quantization, currently supports 'tensor-wise' and 'channel-wise'
         quantization. The default value is 'tensor-wise'.
     LayerOutputCollector : subclass of CalibrationCollector
@@ -700,13 +702,14 @@ def quantize_graph(sym, arg_params, aux_params, device=cpu(),
     return qsym, qarg_params, aux_params, collector, calib_layers
 
 def calib_graph(qsym, arg_params, aux_params, collector,
-                calib_mode='entropy', logger=logging):
+                calib_mode='entropy', logger=None):
     """User-level API for calibrating a quantized model using a filled collector.
     The backend quantized operators are only enabled for Linux systems. Please do not run
     inference using the quantized models on Windows for now.
+
     Parameters
     ----------
-    qsym : str or Symbol
+    qsym : Symbol
         Defines the structure of a neural network for INT8 data types.
     arg_params : dict
         Dictionary of name to `NDArray`.