add int8 bn mkldnn implementation and test #15664

ElaineBao · 2019-07-26T07:42:22Z

Description

Add a new operator - int8 batch norm, mkldnn implementation and test
@pengzhao-intel @ZhennanQin

Details

Usage

Check the doc in https://github.com/apache/incubator-mxnet/tree/master/example/quantization/README.md to quantize models and do inference.
In order to use standalone int8 batch norm instead of fused fp32 batch norm, one should export MXNET_DISABLE_MKLDNN_FUSE_CONV_BN=1 before using imagenet_gen_qsym_mkldnn.py to quantize the model.
Suggest to use fuse bn if it can be fused, since it's faster and more accurate. Otherwise use standalone int8 bn.

Limitation

Currently int8 bn only support s8 input, since mkldnn batch norm only support s8 input
Currently int8 bn cannot support calib_mode = none, since when calculating the thresholds on the fly with s8 input, errors are large. One can run with calib_mode=naïve/entropy, should have a similar accuracy with fp32 model.

Performance

I tested several models on skylake, which can be used for reference.

Models	FP32 Acc	INT8 (fuse fp32 bn) Acc	INT8 (standalone int8 bn) Acc
Resnet50 V1	0.765	0.760	0.751
Mobilenet1.0	0.722	0.720	0.677
Inception V3	0.782	0.782	0.772

abhinavs95 · 2019-07-26T19:25:38Z

@mxnet-label-bot add [mkldnn, Backend, pr-awaiting-review]

xinyu-intel · 2019-08-01T05:51:22Z

@ElaineBao Can you try this on resnetv2? Theoretically, the performance will be better since lots of bn-relu-conv pattern in this model.

pengzhao-intel · 2019-08-01T05:51:45Z

@ZhennanQin @ciyongch, please help take a review.

pengzhao-intel · 2019-08-01T05:52:49Z

@ElaineBao could you elaborate the reason for standalone BN leads a bit more accuracy dorp?

ElaineBao · 2019-08-01T06:00:13Z

@ElaineBao Can you try this on resnetv2? Theoretically, the performance will be better since lots of bn-relu-conv pattern in this model.

that's a good advice, I'll try it and update the performance, thank you.

ElaineBao · 2019-08-01T06:08:24Z

@ElaineBao could you elaborate the reason for standalone BN leads a bit more accuracy dorp?

Basically the accuracy drop is not because the BN is fused or standalone, it's because the BN is converted from fp32 to int8.
It's acceptable to have a bit accuracy drop if one op is converted from fp32 to int8, but for int8 bn maybe the drop on accuracy is too much, especially for Mobilenet1.0. I guess it has something to do with the distribution of values of parameters in Mobilenet1.0.
I'll look into it.

xinyu-intel · 2019-08-01T06:18:41Z

@ElaineBao unfuse bn will also introduce standalone quanitzed_activation along with quantized_bn.

ElaineBao · 2019-08-06T08:53:18Z

Hi, all, I've looked into the performance issue, and concluded that as a operator, int8 bn itself has no performance regression. The accuracy drop happened in some models is due to the combination of int8 bn and other operators, which may cause a poor weight distribution.
One should analyze and decide which situation is suitable for using int8 bn. And I'll give a doc on how to use it later.

pengzhao-intel

LGTM

pengzhao-intel · 2019-08-06T10:18:36Z

@ZhennanQin please take a review too.
@ElaineBao please update the doc in here and mark int8 bn to Y
https://mxnet.incubator.apache.org/versions/master/tutorials/mkldnn/operator_list.html

src/operator/nn/batch_norm-inl.h

ZhennanQin · 2019-08-07T01:40:34Z

src/operator/nn/batch_norm.cc

@@ -396,7 +396,7 @@ void BatchNormComputeExCPU(const nnvm::NodeAttrs &attrs,
  CHECK_EQ(inputs.size(), 5U);
  const BatchNormParam &param = nnvm::get<BatchNormParam>(attrs.parsed);
  // MKLDNN batchnorm only works well on the special MKLDNN layout.
-  if (SupportMKLDNNBN(inputs[0], param) && inputs[0].IsMKLDNNData()) {
+  if (SupportMKLDNNBN(inputs[0], param) /*&& inputs[0].IsMKLDNNData() */) {


Not sure if we can remove this. @TaoLv for double check.

ZhennanQin · 2019-08-07T01:42:14Z