-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MKLDNN] Support channel wise quantization for FullyConnected #17187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZhennanQin @xinyu-intel to review as well
python/mxnet/contrib/quantization.py
Outdated
@@ -459,7 +464,8 @@ def quantize_model(sym, arg_params, aux_params, | |||
data_names=('data',), label_names=('softmax_label',), | |||
ctx=cpu(), excluded_sym_names=None, excluded_op_names=None, calib_mode='entropy', | |||
calib_data=None, num_calib_examples=None, | |||
quantized_dtype='int8', quantize_mode='smart', logger=None): | |||
quantized_dtype='int8', quantize_mode='smart', | |||
quantize_granularity='channel-wise', logger=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the default value "channel-wise"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, forget to change back to tensor-wise
as default value after validation :) Will change default value to tensor-wise
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add quantize_granularity
in quantize_model_mkldnn
api and quantize_net_v2
api and make original type in quantize_net
api. GluonNLP will use quantize_net_v2
api.
By the way, for the previous quantized models, will they still work with this patch?
Thanks @xinyu-intel, the new patch was updated to include those changes to support broader user-level quantization API. This patch is backward compatible, so no worry for the existing quantized models. |
const char *quantize_mode, uint32_t* out_num_calib_names, | ||
const char ***out_calib_names); | ||
const char *quantize_mode, const char *quantize_granularity, | ||
uint32_t* out_num_calib_names, const char ***out_calib_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API was called by _quantize_symbol()
only, and the new argument quantize_granularity
carries the default vale of tensor-wise
from Python to C++ perspective.
For users doing quantization with quantization.py
, setting quantize_granularity
to channel-wise
explicitly to do channel-wise quantization for FullyConnected, otherwise it will do tensor-wise quantization which is same as before (the default behavior).
python/mxnet/contrib/quantization.py
Outdated
@@ -978,7 +1004,7 @@ def __exit__(self, exc_type, exc_value, traceback): | |||
net.collect_params().reset_ctx(ctx) | |||
return net | |||
|
|||
def quantize_net(network, quantized_dtype='auto', quantize_mode='full', | |||
def quantize_net(network, quantized_dtype='auto', quantize_mode='full', quantize_granularity='tensor-wise', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about keeping this API as is and encourage users to call quantize_net_v2
which contains the new parameters. quantize_net
will call quantize_net_v2
with default parameters. @xinyu-intel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the only difference between quantize_net_v2
and quantize_net
is the new param LayerOutputCollector
so far, quantize_granularity
will be another new param introduced by this PR.
As there's not any other difference, how about jsut combine these two API into one?
Otherwise, there could be v3
, v4
version in the future if there's new param needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there is a better way to handle this kind of change. Since quantize_net
is an user interface and has been released, adding new parameters to it might break user workloads. One convention is to add v2
implementation, deprecate the original one, and replace it in the next major release. The v2
implementation is not released yet, so I think it's safe to change it to add quantize_granularity
. If it's released, yes, we might need v3
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense, so let's keep the quantize_net
api as is, only adding the new params in v2
version.
return true; | ||
.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", []( | ||
const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) { | ||
return (index == 0) ? false : true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (index != 0);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good.
return false; | ||
.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", []( | ||
const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) { | ||
return (index == 0) ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (index == 0);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good.
// False True/False False | ||
if (channel_wise && !support_channelwise_scale) { | ||
LOG(FATAL) | ||
<< "Currently, channel-wise quantization requires fuse requantize or dequantize."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it something that users may encounter? If so, what kind of action is suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the common case, but this would happen in the case of setting MXNET_DISABLE_MKLDNN_QFC_FLOAT_OUTPUT
or MXNET_DISABLE_MKLDNN_QFC_FUSE_ALL
to true, which means quantized Fullyconnected will not fused with either requantize
or dequantize
.
So this msg will give a hint to user enable fusion to enable this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. How about suggesting in the error message to check these two env variables and set correct values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem :)
@@ -767,7 +773,7 @@ def test_pos_conv_bn_sum_act(): | |||
"softrelu": True, | |||
"relu6": False, | |||
"leakyrelu": True, | |||
"gelu": True} | |||
"gelu": False} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change it false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convolution with post sum and activation only supports "relu" currently, previously UT didn't catch such failure.
@TaoLv @ZhennanQin @xinyu-intel please help to review the latest changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for the contribution. Merging now~ |
Description
Add channel-wise quantization support for FullyConnected, which will bring accuracy benefit for some models such as BERT SQuAD, etc.
The default quantization keeps the same as now which is using tensor-wise quantization, user can switch to use channel-wise quantization by setting arguments
quantize_granularity="channel-wise"
when callingquantize_model()
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
@pengzhao-intel @TaoLv