Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

add int8 bn mkldnn implementation and test #15664

Merged
merged 12 commits into from
Aug 8, 2019

Conversation

ElaineBao
Copy link
Contributor

@ElaineBao ElaineBao commented Jul 26, 2019

Description

Add a new operator - int8 batch norm, mkldnn implementation and test
@pengzhao-intel @ZhennanQin

Details

Usage

  1. Check the doc in https://github.com/apache/incubator-mxnet/tree/master/example/quantization/README.md to quantize models and do inference.
  2. In order to use standalone int8 batch norm instead of fused fp32 batch norm, one should export MXNET_DISABLE_MKLDNN_FUSE_CONV_BN=1 before using imagenet_gen_qsym_mkldnn.py to quantize the model.
  3. Suggest to use fuse bn if it can be fused, since it's faster and more accurate. Otherwise use standalone int8 bn.

Limitation

  1. Currently int8 bn only support s8 input, since mkldnn batch norm only support s8 input
  2. Currently int8 bn cannot support calib_mode = none, since when calculating the thresholds on the fly with s8 input, errors are large. One can run with calib_mode=naïve/entropy, should have a similar accuracy with fp32 model.

Performance

I tested several models on skylake, which can be used for reference.

Models FP32 Acc INT8 (fuse fp32 bn) Acc INT8 (standalone int8 bn) Acc
Resnet50 V1 0.765 0.760 0.751
Mobilenet1.0 0.722 0.720 0.677
Inception V3 0.782 0.782 0.772

@ElaineBao ElaineBao requested a review from nswamy as a code owner July 26, 2019 12:33
@abhinavs95
Copy link
Contributor

@mxnet-label-bot add [mkldnn, Backend, pr-awaiting-review]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet MKLDNN pr-awaiting-review PR is waiting for code review labels Jul 26, 2019
@xinyu-intel
Copy link
Contributor

@ElaineBao Can you try this on resnetv2? Theoretically, the performance will be better since lots of bn-relu-conv pattern in this model.

@pengzhao-intel
Copy link
Contributor

@ZhennanQin @ciyongch, please help take a review.

@pengzhao-intel
Copy link
Contributor

@ElaineBao could you elaborate the reason for standalone BN leads a bit more accuracy dorp?

@ElaineBao
Copy link
Contributor Author

@ElaineBao Can you try this on resnetv2? Theoretically, the performance will be better since lots of bn-relu-conv pattern in this model.

that's a good advice, I'll try it and update the performance, thank you.

@ElaineBao
Copy link
Contributor Author

@ElaineBao could you elaborate the reason for standalone BN leads a bit more accuracy dorp?

Basically the accuracy drop is not because the BN is fused or standalone, it's because the BN is converted from fp32 to int8.
It's acceptable to have a bit accuracy drop if one op is converted from fp32 to int8, but for int8 bn maybe the drop on accuracy is too much, especially for Mobilenet1.0. I guess it has something to do with the distribution of values of parameters in Mobilenet1.0.
I'll look into it.

@xinyu-intel
Copy link
Contributor

@ElaineBao unfuse bn will also introduce standalone quanitzed_activation along with quantized_bn.

@ElaineBao
Copy link
Contributor Author

Hi, all, I've looked into the performance issue, and concluded that as a operator, int8 bn itself has no performance regression. The accuracy drop happened in some models is due to the combination of int8 bn and other operators, which may cause a poor weight distribution.
One should analyze and decide which situation is suitable for using int8 bn. And I'll give a doc on how to use it later.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

@ZhennanQin please take a review too.
@ElaineBao please update the doc in here and mark int8 bn to Y
https://mxnet.incubator.apache.org/versions/master/tutorials/mkldnn/operator_list.html

@@ -396,7 +396,7 @@ void BatchNormComputeExCPU(const nnvm::NodeAttrs &attrs,
CHECK_EQ(inputs.size(), 5U);
const BatchNormParam &param = nnvm::get<BatchNormParam>(attrs.parsed);
// MKLDNN batchnorm only works well on the special MKLDNN layout.
if (SupportMKLDNNBN(inputs[0], param) && inputs[0].IsMKLDNNData()) {
if (SupportMKLDNNBN(inputs[0], param) /*&& inputs[0].IsMKLDNNData() */) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can remove this. @TaoLv for double check.

if (param.min_calib_range.has_value() && param.max_calib_range.has_value()) {
*max_output_ptr =
std::max(std::abs(param.min_calib_range.value()), std::abs(param.max_calib_range.value()));
*min_output_ptr = -*max_output_ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not *min_output_ptr = param.min_calib_range.value()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if min_calib_range=-5, max_calib_range=10, then *max_output_ptr=10, *min_output_ptr=-10, it;s symmetric.

CHECK_EQ(weight_mem.get_primitive_desc().get_size(), channel_count * sizeof(float) * 2);
float *weight_buf = reinterpret_cast<float *>(weight_mem.get_data_handle());

NDArray gamma = in_data[quantized_batchnorm::kGamma];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to const NDArray& gamma

float *weight_buf = reinterpret_cast<float *>(weight_mem.get_data_handle());

NDArray gamma = in_data[quantized_batchnorm::kGamma];
NDArray beta = in_data[quantized_batchnorm::kBeta];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


NDArray gamma = in_data[quantized_batchnorm::kGamma];
NDArray beta = in_data[quantized_batchnorm::kBeta];
float *gamma_ptr = gamma.data().dptr<float>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems gamma only used here. Perhaps you can remove it as
float *gamma_ptr = in_data[quantized_batchnorm::kGamma].data().dptr<float>();

float *moving_var_ptr = moving_var.data().dptr<float>();

// rescale gamma and beta, to make mean=0 and var=1
NDArray rescaled_mean = NDArray(moving_mean.storage_type(), moving_mean.shape(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create temp memory from TmpMemMgr

}

const NDArray &out = outputs[batchnorm::kOut];
auto out_mem = const_cast<NDArray &>(out).CreateMKLDNNData(fwd.GetPd().dst_primitive_desc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use CreateMKLDNNMem instead. Avoid using const_cast unless you have to.

if (!dispatched) {
dispatched = MKLDNNStorageType(attrs, dev_mask, true, dispatch_mode, in_attrs, out_attrs);
}
if (!MKLDNNEnvSet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, MKLDNNStorageType will check this.

@ElaineBao ElaineBao reopened this Aug 7, 2019
@@ -175,6 +175,47 @@ class MKLDNNBNForward {
}
}

void SetDataHandle(const NDArray &data, const mkldnn::memory *mean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't duplicate code. Make old version on top of this one.

Copy link
Contributor

@ZhennanQin ZhennanQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

Thanks for the great works :) Merging now.

@pengzhao-intel pengzhao-intel merged commit b6972bb into apache:master Aug 8, 2019
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
* add int8 bn mkldnn implementation and test

* fix lint

* fix ci

* enable int8 bn test only in mkldnn backend

* disable int8 bn forward test with gpu backend

* update int8 bn with reference to comments

* fix lint

* disable int8 bn gluon forward test with gpu backend

* disable uint8 bn forward test with mkldnn backend

* restore support mkldnn bn condition

* rm duplicate code
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet MKLDNN pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants