Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve quantization flow #15961

Merged
merged 15 commits into from
Aug 29, 2019
23 changes: 11 additions & 12 deletions example/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,21 +86,20 @@ Use the following command to install [Gluon-CV](https://gluon-cv.mxnet.io/):
pip install gluoncv
```

Below are some quantization demos. These models have been tested on Linux systems.
The following models have been tested on Linux systems. Accuracy is collected on Intel XEON Cascade Lake CPU. For CPU with Skylake Lake or eariler architecture, the accuracy may not be the same.

| Model | Source | Dataset | FP32 Accuracy (top-1/top-5)| INT8 Accuracy (top-1/top-5)|
|:---|:---|---|:---:|:---:|
| [ResNet18-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |70.15%/89.38%|69.92%/89.26%|
| [ResNet50-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 76.34%/93.13% | 75.91%/92.95% |
| [ResNet50-V1b](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 76.82%/93.38% | 76.39%/93.24% |
| [ResNet101-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 77.33%/93.59% | 77.05%/93.43% |
|[Squeezenet 1.0](#4)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|56.98%/79.20%|52.98%/77.21%|
|[MobileNet 1.0](#5)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.23%/90.64%|72.03%/90.42%|
|[MobileNetV2 1.0](#6)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|70.27%/89.62%|69.70%/89.26%|
|[Inception V3](#7)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|77.76%/93.83% |77.87%/93.78% |
|[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%/93.07%|76.36%/92.89%|
|[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%/90.63%|72.20%/90.56%|
| [SSD-VGG16](#10) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | VOC2007/2012 | 0.8366 mAP | 0.8364 mAP |
| [ResNet18-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |70.15%/89.38%|69.92%/89.30%|
| [ResNet50-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 76.34%/93.13% | 76.06%/92.99% |
| [ResNet101-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 77.33%/93.59% | 77.07%/93.47% |
|[Squeezenet 1.0](#4)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|56.98%/79.20%|56.79%/79.47%|
|[MobileNet 1.0](#5)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.23%/90.64%|72.06%/90.53%|
|[MobileNetV2 1.0](#6)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|70.27%/89.62%|69.82%/89.35%|
|[Inception V3](#7)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|77.76%/93.83% |78.05%/93.91% |
|[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%/93.07%|76.19%/92.88%|
|[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%/90.63%|72.02%/90.53%|
| [SSD-VGG16](#10) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | VOC2007/2012 | 0.8366 mAP | 0.8357 mAP |
| [SSD-VGG16](#10) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | COCO2014 | 0.2552 mAP | 0.253 mAP |

<h3 id='3'>ResNetV1</h3>
Expand Down
17 changes: 3 additions & 14 deletions example/quantization/imagenet_gen_qsym.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,23 +141,12 @@ def save_params(fname, arg_params, aux_params, logger=None):
excluded_op_names = []
if args.model == 'imagenet1k-resnet-152':
rgb_mean = '0,0,0'
if args.ctx == 'gpu':
calib_layer = lambda name: name.endswith('_output') and (name.find('conv') != -1
or name.find('sc') != -1
or name.find('fc') != -1)
else:
calib_layer = lambda name: name.endswith('_output') and (name.find('conv') != -1
or name.find('sc') != -1)
excluded_sym_names += ['flatten0', 'fc1']
excluded_sym_names += ['flatten0', 'fc1']
if exclude_first_conv:
excluded_sym_names += ['conv0']
elif args.model == 'imagenet1k-inception-bn':
rgb_mean = '123.68,116.779,103.939'
if args.ctx == 'gpu':
calib_layer = lambda name: name.endswith('_output') and (name.find('conv') != -1
or name.find('fc') != -1)
else:
calib_layer = lambda name: name.endswith('_output') and (name.find('conv') != -1)
if args.ctx == 'cpu':
excluded_sym_names += ['flatten', 'fc1']
if exclude_first_conv:
excluded_sym_names += ['conv_1']
Expand Down Expand Up @@ -203,7 +192,7 @@ def save_params(fname, arg_params, aux_params, logger=None):
excluded_op_names=excluded_op_names,
calib_mode=calib_mode, calib_data=data,
num_calib_examples=num_calib_batches * batch_size,
calib_layer=calib_layer, quantized_dtype=args.quantized_dtype,
quantized_dtype=args.quantized_dtype,
logger=logger)
if calib_mode == 'entropy':
suffix = '-quantized-%dbatches-entropy' % num_calib_batches
Expand Down
3 changes: 1 addition & 2 deletions example/quantization/imagenet_gen_qsym_mkldnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@ def save_params(fname, arg_params, aux_params, logger=None):
# get image shape
image_shape = args.image_shape

calib_layer = lambda name: name.endswith('_output') or name == "data"
exclude_first_conv = args.exclude_first_conv
if args.quantized_dtype == "uint8":
logger.info('quantized dtype is set to uint8, will exclude first conv.')
Expand Down Expand Up @@ -295,7 +294,7 @@ def save_params(fname, arg_params, aux_params, logger=None):
ctx=ctx, excluded_sym_names=excluded_sym_names,
calib_mode=calib_mode, calib_data=data,
num_calib_examples=num_calib_batches * batch_size,
calib_layer=calib_layer, quantized_dtype=args.quantized_dtype,
quantized_dtype=args.quantized_dtype,
label_names=(label_name,), logger=logger)
if calib_mode == 'entropy':
suffix = '-quantized-%dbatches-entropy' % num_calib_batches
Expand Down
11 changes: 1 addition & 10 deletions example/ssd/quantization.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,19 +115,10 @@ def save_params(fname, arg_params, aux_params, logger=None):
# get image shape
image_shape = '3,300,300'

def calib_layer(name): return not (name.endswith('_data') or
name.endswith('_weight') or
name.endswith('_bias') or
name.endswith('_workspace'))
# Quantization layer configs
exclude_first_conv = args.exclude_first_conv
excluded_sym_names = []
rgb_mean = '123,117,104'
for i in range(1,19):
excluded_sym_names += ['flatten'+str(i)]
excluded_sym_names += ['multibox_loc_pred',
'concat0',
'concat1']
if exclude_first_conv:
excluded_sym_names += ['conv1_1']

Expand Down Expand Up @@ -159,7 +150,7 @@ def calib_layer(name): return not (name.endswith('_data') or
ctx=ctx, excluded_sym_names=excluded_sym_names,
calib_mode=calib_mode, calib_data=eval_iter,
num_calib_examples=num_calib_batches * batch_size,
calib_layer=calib_layer, quantized_dtype=args.quantized_dtype,
quantized_dtype=args.quantized_dtype,
label_names=(label_name,), logger=logger)
sym_name = '%s-symbol.json' % ('./model/cqssd_vgg16_reduced_300')
param_name = '%s-%04d.params' % ('./model/cqssd_vgg16_reduced_300', epoch)
Expand Down
7 changes: 6 additions & 1 deletion include/mxnet/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -1907,6 +1907,9 @@ MXNET_DLL int MXSymbolInferTypePartial(SymbolHandle sym,
* \param offline_params array of c strings representing the names of params quantized offline
* \param quantized_dtype the quantized destination type for input data
* \param calib_quantize **Deprecated**. quantize op will always be calibrated if could
* \param quantize_mode quantize mode to be used in quantize pass
* \param out_num_calib_names return the number of nodes to be calibrated
* \param out_calib_names return the node names to be calibrated
*/
MXNET_DLL int MXQuantizeSymbol(SymbolHandle sym_handle, SymbolHandle *ret_sym_handle,
const int* dev_type,
Expand All @@ -1915,7 +1918,9 @@ MXNET_DLL int MXQuantizeSymbol(SymbolHandle sym_handle, SymbolHandle *ret_sym_ha
const mx_uint num_excluded_op_names,
const char **excluded_op_names,
const mx_uint num_offline, const char **offline_params,
const char *quantized_dtype, const bool calib_quantize);
const char *quantized_dtype, const bool calib_quantize,
const char *quantize_mode, mx_uint* out_num_calib_names,
const char ***out_calib_names);

/*!
* \brief Convert a symbol into a mixed precision symbol with cast operators for target dtype casting
Expand Down
30 changes: 30 additions & 0 deletions include/mxnet/op_attr_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,16 @@ enum class DispatchMode {
kVariable,
};

/*! \brief the quantization type of the operator */
enum class QuantizeType {
// This operator doesn't support quantization
kNone = 0,
// This operator can get huge benefit from quantization, thus must be quantized
kMust,
// This operator support quantization, but will be decided depending on the connection
kSupport,
};

/*!
* \brief Operator state. This is a pointer type, its content is mutable
* even if OpStatePtr is const.
Expand Down Expand Up @@ -297,6 +307,12 @@ using FInferStorageType = std::function<bool (const NodeAttrs& attrs,
std::vector<int>* in_attrs,
std::vector<int>* out_attrs)>;

/*!
* \brief Register a quantized node creation function based on the attrs of the node
* \note Register under "FQuantizedOp" for non-quantized operators
*/
using FQuantizable = std::function<QuantizeType (const NodeAttrs& attrs)>;

/*!
* \brief Register a quantized node creation function based on the attrs of the node
* \note Register under "FQuantizedOp" for non-quantized operators
Expand All @@ -319,6 +335,20 @@ using FNeedRequantize = std::function<bool (const NodeAttrs& attrs)>;
using FAvoidQuantizeInput = std::function<bool (const NodeAttrs& attrs,
size_t index)>;

/*!
* \brief Register a function to determine if the input of a quantized operator
* needs to be calibrated. This is usually used for the quantized operators
* which need calibration on its input.
*/
using FNeedCalibrateInput = std::function<std::vector<int> (const NodeAttrs& attrs)>;

/*!
* \brief Register a function to determine if the output of a quantized operator
* needs to be calibrated. This is usually used for the quantized operators
* which need calibration on its output.
*/
using FNeedCalibrateOutput = std::function<std::vector<int> (const NodeAttrs& attrs)>;

} // namespace mxnet

#endif // MXNET_OP_ATTR_TYPES_H_
Loading