Skip to content

Conversation

@ibsidorenko
Copy link
Contributor

@ibsidorenko ibsidorenko commented Aug 12, 2022

Main goals for this PR are the following:

  1. E2E compilation should work without QNN canonicalization on the Hexagon target.
  2. Enable int8 -> int8 computation for dense/conv2d operation.

What was done:

  1. Disabled qnn::Legalize for Hexagon target. For other targets QNN passes keep working. By default QNN legalize and QNN canonicalize are enabled, but it can be disabled through the pass config. Example:
    # Disable QNN legalization and canonicalization passes
    with tvm.transform.PassContext(opt_level=3, disabled_pass=["qnn.Legalize"]):
        mod, _ = relay.optimize(mod, tvm.target.Target(target_hexagon, host=target_hexagon))
  1. Implemented new strategies (compute and schedules) for QNN ops. This is POC code. There was no goal to implement high performance compute/schedule functions. This will be done in future.
  2. To enable int8 -> int8 computation for dense/conv2d operation we need to merge dense|conv2d + [bias]! + requantize.
    There are two ways how to implement this:
    A) Implement new IRModule -> IRModule pass that will pattern match combination conv2d+requantize/conv2d+bias+requantize and etc.
    B) Change TECompiler and lower sequence of QNN operation into one TOPI op.
    Possibly, A) is more natural but A) implies adding dozens of new Relay ops which only make sense for Hexagon target, not genetic and overloaded with a huge number of arguments (inputs, quantization parameters, bias, requantization parameters etc.)
    That's why B) approach was implemented in this PR.

cc @mehrdadh

@ibsidorenko ibsidorenko marked this pull request as draft August 12, 2022 06:57
@masahi masahi self-assigned this Aug 12, 2022
@github-actions github-actions bot requested a review from mehrdadh August 12, 2022 09:39
@masahi
Copy link
Member

masahi commented Oct 14, 2022

@tvm-bot rerun

@masahi masahi marked this pull request as ready for review October 17, 2022 07:16
@masahi
Copy link
Member

masahi commented Oct 17, 2022

@tvm-bot rerun

pass_seqs.push_back(relay::qnn::transform::Legalize());
// Skip these passes for Hexagon target.
if ((is_homogeneous && homogeneous_target->GetTargetDeviceType() != kDLHexagon) ||
!is_homogeneous)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always disabling qnn legalize for Hexagon could be problematic, since we want it to be applied for vrmpy tensorization (like #12911).

So instead, I suggest using qnn legalize by default, and let users disable it via disabled_pass=["qnn.Legalize"]. To do that, we can refactor

Pass Legalize() {
Array<Pass> pass_seqs;
pass_seqs.push_back(relay::transform::Legalize("FTVMQnnLegalize"));
pass_seqs.push_back(relay::transform::Legalize("FTVMQnnCanonicalize"));
relay::transform::Pass seq = relay::transform::Sequential(pass_seqs);
return seq;
}

and formally define qnn.Legalize pass, by CreateFunctionPass(..., "qnn.Legalize", ...).

This should also remove the need for the GetPassPrefix API and other changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def qnn_quantize(data, output_scale, output_zero_point, axis, out_dtype):
"""Compute for qnn.quantize
Note! This is POC code. There was no goal to implement high performance compute function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. It was a reminder for QC people.

"""Compute for qnn.conv2d with NCHW layout
Note! This is POC code. There was no goal to implement high performance compute function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that the output can be int32 or odtype depending on the presence of rq parameters.

The same comment for depthwise conv2d, dense etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""Compute for qnn.dense
Note! This is POC code. There was no goal to implement high performance compute function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can requantize support be added for bmm as well? If not, I think it is ok to drop support for qnn.batch_matmul entirely for now, since this PR already is very big. We also need to update the pattern matcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible to do. There is no any limitation. I can add

static auto flower_call = tvm::runtime::Registry::Get("relay.backend.lower_call");
ICHECK(flower_call) << "relay.backend.lower_call is not registered.";

if (target_->GetTargetDeviceType() == kDLHexagon) pattern_matcher_.Register(call_node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove this check. Done.

}

// Helper class that is used during lowering to TE.
// It matches sequence of Ops and lower them into single TOPI operation. Has sense for Hexagon only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Has sense for Hexagon only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Helper class that is used during lowering to TE.
// It matches sequence of Ops and lower them into single TOPI operation. Has sense for Hexagon only.
// All supported patterns are enumerated in "supported_patterns_"
class PatternMatcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this class can be used in general settings, currently it is only used for QNN ops and the class implementation is hardcoded for them. So how about we change the class name to QNNPatternMatcher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, no objections. Done.

@masahi
Copy link
Member

masahi commented Oct 17, 2022

@tvm-bot rerun

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 18, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

ibsidorenko added 2 commits October 18, 2022 12:32
QNN passes are enabled by default. To disable use
disabled_pass=["qnn.Legalize"] in pass config.
@masahi masahi merged commit 010d05c into apache:main Oct 18, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 10, 2022
* [QNN] Disable QNN canonicalization pass.

This commit enables work of TVM without QNN canonicalization pass.
It adds new TOPI ops for QNN + simple compute/schedules.

* added dependence of the qnn::transform::Legalize pass launch on target.

* Added new dense topi operator for the pattern qnn.dense+bias+requantize

* Added support of axis attribute for QNN TOPI ops

* Fixed TOPI compute implementation for qnn.add

* Fixed issue with non zero padding value for qnn.conv2d

* Fixed Bias.add for qnn.conv2d

* Added support of depthwise qnn.conv2d topi operator

* Added support of 1D quantization params in qnn.dequantize

* Added support of qnn.concatenate

* Fixed out of range array access

* Added meta_schedule_original_shape attribute in QDenseAttr and
QConv2DAttr

* Added support of qnn.batch_matmul as a standalone op.

* Added per channel zp in qnn.dense and qnn.conv2d.

* Fixed corner cases like dense+bias+bias+rq.

* Added unit test.

* Removed rq_out_dtype and axis attributes declaration in QConv2DAttra and
QDenseAttrs.

* Changed target x86->Hexagon to disable QNN passes.

* Fixed issue with QDenseAttrs and QConv2dAttrs.

* Fixed build for Cortex-M.

* Removed QDenseAttrs and QConv2dAttrs

* Fix tests after rebase

* Address code review comments.

* [QNN] Add option to disabe QNN passes.

QNN passes are enabled by default. To disable use
disabled_pass=["qnn.Legalize"] in pass config.

* Revert changes of GetPassPrefix interface.
Comment on lines +314 to +315
_input_scale,
_kernel_scale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibsidorenko, @supersat and I were working through the flow you introduced in this PR and found that _kernel_scale and _input_scale` are not used, are these being folded into the requantize scaling params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csullivan yes, you are right, input/kernel scales should be taken into account in subsequent requantize op.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [QNN] Disable QNN canonicalization pass.

This commit enables work of TVM without QNN canonicalization pass.
It adds new TOPI ops for QNN + simple compute/schedules.

* added dependence of the qnn::transform::Legalize pass launch on target.

* Added new dense topi operator for the pattern qnn.dense+bias+requantize

* Added support of axis attribute for QNN TOPI ops

* Fixed TOPI compute implementation for qnn.add

* Fixed issue with non zero padding value for qnn.conv2d

* Fixed Bias.add for qnn.conv2d

* Added support of depthwise qnn.conv2d topi operator

* Added support of 1D quantization params in qnn.dequantize

* Added support of qnn.concatenate

* Fixed out of range array access

* Added meta_schedule_original_shape attribute in QDenseAttr and
QConv2DAttr

* Added support of qnn.batch_matmul as a standalone op.

* Added per channel zp in qnn.dense and qnn.conv2d.

* Fixed corner cases like dense+bias+bias+rq.

* Added unit test.

* Removed rq_out_dtype and axis attributes declaration in QConv2DAttra and
QDenseAttrs.

* Changed target x86->Hexagon to disable QNN passes.

* Fixed issue with QDenseAttrs and QConv2dAttrs.

* Fixed build for Cortex-M.

* Removed QDenseAttrs and QConv2dAttrs

* Fix tests after rebase

* Address code review comments.

* [QNN] Add option to disabe QNN passes.

QNN passes are enabled by default. To disable use
disabled_pass=["qnn.Legalize"] in pass config.

* Revert changes of GetPassPrefix interface.
@ibsidorenko ibsidorenko deleted the disable-qnn-canon-v3 branch March 29, 2023 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants