-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[QNN][Hexagon] Disable QNN canonicalization pass #12398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit enables work of TVM without QNN canonicalization pass. It adds new TOPI ops for QNN + simple compute/schedules.
4dc9cac to
239c061
Compare
239c061 to
9fe7401
Compare
|
@tvm-bot rerun |
|
@tvm-bot rerun |
src/relay/backend/utils.cc
Outdated
| pass_seqs.push_back(relay::qnn::transform::Legalize()); | ||
| // Skip these passes for Hexagon target. | ||
| if ((is_homogeneous && homogeneous_target->GetTargetDeviceType() != kDLHexagon) || | ||
| !is_homogeneous) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always disabling qnn legalize for Hexagon could be problematic, since we want it to be applied for vrmpy tensorization (like #12911).
So instead, I suggest using qnn legalize by default, and let users disable it via disabled_pass=["qnn.Legalize"]. To do that, we can refactor
tvm/src/relay/qnn/pass/legalize.cc
Lines 33 to 39 in 8131364
| Pass Legalize() { | |
| Array<Pass> pass_seqs; | |
| pass_seqs.push_back(relay::transform::Legalize("FTVMQnnLegalize")); | |
| pass_seqs.push_back(relay::transform::Legalize("FTVMQnnCanonicalize")); | |
| relay::transform::Pass seq = relay::transform::Sequential(pass_seqs); | |
| return seq; | |
| } |
and formally define qnn.Legalize pass, by CreateFunctionPass(..., "qnn.Legalize", ...).
This should also remove the need for the GetPassPrefix API and other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/tvm/topi/hexagon/qnn.py
Outdated
| def qnn_quantize(data, output_scale, output_zero_point, axis, out_dtype): | ||
| """Compute for qnn.quantize | ||
| Note! This is POC code. There was no goal to implement high performance compute function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to leave this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. It was a reminder for QC people.
| """Compute for qnn.conv2d with NCHW layout | ||
| Note! This is POC code. There was no goal to implement high performance compute function. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document that the output can be int32 or odtype depending on the presence of rq parameters.
The same comment for depthwise conv2d, dense etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/tvm/topi/hexagon/qnn.py
Outdated
| """Compute for qnn.dense | ||
| Note! This is POC code. There was no goal to implement high performance compute function. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can requantize support be added for bmm as well? If not, I think it is ok to drop support for qnn.batch_matmul entirely for now, since this PR already is very big. We also need to update the pattern matcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible to do. There is no any limitation. I can add
| static auto flower_call = tvm::runtime::Registry::Get("relay.backend.lower_call"); | ||
| ICHECK(flower_call) << "relay.backend.lower_call is not registered."; | ||
|
|
||
| if (target_->GetTargetDeviceType() == kDLHexagon) pattern_matcher_.Register(call_node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can remove this check. Done.
| } | ||
|
|
||
| // Helper class that is used during lowering to TE. | ||
| // It matches sequence of Ops and lower them into single TOPI operation. Has sense for Hexagon only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Has sense for Hexagon only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // Helper class that is used during lowering to TE. | ||
| // It matches sequence of Ops and lower them into single TOPI operation. Has sense for Hexagon only. | ||
| // All supported patterns are enumerated in "supported_patterns_" | ||
| class PatternMatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this class can be used in general settings, currently it is only used for QNN ops and the class implementation is hardcoded for them. So how about we change the class name to QNNPatternMatcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no objections. Done.
|
@tvm-bot rerun |
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
QNN passes are enabled by default. To disable use disabled_pass=["qnn.Legalize"] in pass config.
* [QNN] Disable QNN canonicalization pass. This commit enables work of TVM without QNN canonicalization pass. It adds new TOPI ops for QNN + simple compute/schedules. * added dependence of the qnn::transform::Legalize pass launch on target. * Added new dense topi operator for the pattern qnn.dense+bias+requantize * Added support of axis attribute for QNN TOPI ops * Fixed TOPI compute implementation for qnn.add * Fixed issue with non zero padding value for qnn.conv2d * Fixed Bias.add for qnn.conv2d * Added support of depthwise qnn.conv2d topi operator * Added support of 1D quantization params in qnn.dequantize * Added support of qnn.concatenate * Fixed out of range array access * Added meta_schedule_original_shape attribute in QDenseAttr and QConv2DAttr * Added support of qnn.batch_matmul as a standalone op. * Added per channel zp in qnn.dense and qnn.conv2d. * Fixed corner cases like dense+bias+bias+rq. * Added unit test. * Removed rq_out_dtype and axis attributes declaration in QConv2DAttra and QDenseAttrs. * Changed target x86->Hexagon to disable QNN passes. * Fixed issue with QDenseAttrs and QConv2dAttrs. * Fixed build for Cortex-M. * Removed QDenseAttrs and QConv2dAttrs * Fix tests after rebase * Address code review comments. * [QNN] Add option to disabe QNN passes. QNN passes are enabled by default. To disable use disabled_pass=["qnn.Legalize"] in pass config. * Revert changes of GetPassPrefix interface.
| _input_scale, | ||
| _kernel_scale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibsidorenko, @supersat and I were working through the flow you introduced in this PR and found that _kernel_scale and _input_scale` are not used, are these being folded into the requantize scaling params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csullivan yes, you are right, input/kernel scales should be taken into account in subsequent requantize op.
* [QNN] Disable QNN canonicalization pass. This commit enables work of TVM without QNN canonicalization pass. It adds new TOPI ops for QNN + simple compute/schedules. * added dependence of the qnn::transform::Legalize pass launch on target. * Added new dense topi operator for the pattern qnn.dense+bias+requantize * Added support of axis attribute for QNN TOPI ops * Fixed TOPI compute implementation for qnn.add * Fixed issue with non zero padding value for qnn.conv2d * Fixed Bias.add for qnn.conv2d * Added support of depthwise qnn.conv2d topi operator * Added support of 1D quantization params in qnn.dequantize * Added support of qnn.concatenate * Fixed out of range array access * Added meta_schedule_original_shape attribute in QDenseAttr and QConv2DAttr * Added support of qnn.batch_matmul as a standalone op. * Added per channel zp in qnn.dense and qnn.conv2d. * Fixed corner cases like dense+bias+bias+rq. * Added unit test. * Removed rq_out_dtype and axis attributes declaration in QConv2DAttra and QDenseAttrs. * Changed target x86->Hexagon to disable QNN passes. * Fixed issue with QDenseAttrs and QConv2dAttrs. * Fixed build for Cortex-M. * Removed QDenseAttrs and QConv2dAttrs * Fix tests after rebase * Address code review comments. * [QNN] Add option to disabe QNN passes. QNN passes are enabled by default. To disable use disabled_pass=["qnn.Legalize"] in pass config. * Revert changes of GetPassPrefix interface.
Main goals for this PR are the following:
What was done:
There are two ways how to implement this:
A) Implement new IRModule -> IRModule pass that will pattern match combination conv2d+requantize/conv2d+bias+requantize and etc.
B) Change TECompiler and lower sequence of QNN operation into one TOPI op.
Possibly, A) is more natural but A) implies adding dozens of new Relay ops which only make sense for Hexagon target, not genetic and overloaded with a huge number of arguments (inputs, quantization parameters, bias, requantization parameters etc.)
That's why B) approach was implemented in this PR.
cc @mehrdadh