-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug fix] Fixed immutable op quantization when non-quantizable op exists in multiple followed ops #39342
[Bug fix] Fixed immutable op quantization when non-quantizable op exists in multiple followed ops #39342
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition in line 39 of paddle/fluid/framework/ir/mkldnn/cpu_quantize_pass_tester.cc seems wrong because if type is "quantize" the first test for type != "dropout" will be true and other conditions should not be evaluated
if (type != "dropout" || type != "quantize" || type != "dequantize") {
@@ -417,36 +417,88 @@ void TestImmutableOpBetweenNonQuantizedOp(const std::string tested_op) { | |||
SCALE * S8_MAX); | |||
} | |||
|
|||
// a->Dropout1->b | |||
// b->TestedOp1(not quantized)->c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment says not quantized but in line 434 is set to int8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because in cpu_quantize_placement_pass
we set mkldnn_data_type to int8 to all supported operators, then in cpu_quantize_pass
we decide if it will be quantized or not. Maybe I should change this comment to (will be quantized)
and (won't be quantized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Or should/shouldn't be quantized sounds also good.
for (auto output : node->outputs) { | ||
if (!output->IsOp() || | ||
!(output->Op()->Type() == "quantize" || | ||
platform::HasOpINT8DataType(output->Op()))) | ||
return false; | ||
} | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can suggest another approach that might be more readable because it has less !
and or
and single return
:
// return true only if all of outputs are ops and their are either quantize or have int8 data type
return all_of(node->outputs.begin(),node->outputs.end(), [](Node* output){output->IsOp() && (output->Op()->Type() == "quantize" || platform::HasOpINT8DataType(output->Op()))});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thank you, this is a more elegant approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// c->TestedOp2(will be quantized)->e | ||
// e->Pool2d1(will be quantized)->f | ||
// e->Pool2d2(will be quantized)->g | ||
void TestImmutableOpWithManyOutputs(const std::string tested_op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*
a
|
Dropout1
|
b
|
TestedOp1(won't be quantized)
|
c c
/ \
Dropout2 TestedOp2(will be quantized)
|
e
/ \
Pool2d1(will be quantized) Pool2d2(will be quantized)
| |
f g
*/
- If one immutable op has many next op and more than one of the next op is not quantizable, then do not quantize this immutable op ? Is it correct?
- But why TestedOp2, Pool2d1 and Pool2d2 should be not quantized? testedop2 has two followed op both quantizable. What about change the condition
if (!(IsOpDequantized(prev_op)) && !(IsOpQuantized(nearest_interp_out))) {
return;
}
to
if (!(IsOpDequantized(prev_op)&& IsOpQuantized(nearest_interp_out)) {
return;
}
Is this quantization is slow? Or it is difficult to get input scale for TestedOp2?
Just had some doubts, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the next op is not quantizable we don't want to quantize TestedOp2, because we will have to put quantize
before TestedOp2 and dequantize
after.
Immutable ops usually have the same performance comparing FP32 and INT8. That is why we should be careful adding quantize
, dequantize
.
So in this situation, we skip quantization of this operator only when prev_op won't be quantize and not all next ops will be quantize
if (!(IsOpDequantized(prev_op)) && !(IsOpQuantized(nearest_interp_out))) {
return;
}
In any other situation, we will quantize this op.
PR types
Bug fixes
PR changes
Others
Describe
This PR:
cpu_quantize_pass
There was a problem in quantizing the picodet_m_416_coco model. For operators such as reshape2, transpose2, slice, and nearest_interp/v2, it is necessary to check if there are quantized operators before or after the operator because quantizing these operators alone does not give you any speed up. So in
cpu_quantize_pass
we look for the patternprev_op
->op
->next_op
to check it. It turned out that the solution did not support a situation where there are not one but many operators after the op. This PR fixes that.