[QNN][Legalize] Specialize for Platforms w/o fast Int8 support #4307

anijain2305 · 2019-11-11T19:52:52Z

More details at https://discuss.tvm.ai/t/qnn-conv2d-dense-legalize-for-platforms-with-no-fast-int8-units/4698

QNN op lowering is currently optimized for HW platforms that have fast Int8 HW. This PR supports different lowering for platforms w/o any fast units. Helps Raspberry Pi and old Intel servers.

@jackwish @FrozenGene @yzhliu @tqchen @ajtulloch

zhenhuaw-me · 2019-11-12T04:02:22Z

python/tvm/relay/qnn/op/legalizations.py

+@qnn_conv2d_legalize.register('cpu')
+def _qnn_conv2d_legalize_intel_cpu(attrs, inputs, types):
+    # The VNNI transformations prefer uint8 x int8 datatypes.
+    if is_fast_int8_hw_present():


As we are already Intel CPU here, I think the HW feature checking can try Intel CPU directly.

This function is used twice - for conv2d and dense, even for Intel CPU. So, I decided to put that into a function. I think this might be ok. We can have one place where we can filter out the targets that have fast int8 HW.

Thanks for the reply. Yes, I have seen it been used by dense and conv2d. I mean if we can split it per different targets rather than merge arm and x86 into one. It could be a bit weird to run target dependent logic when we are already know the targets, though the code here guarantees the correctness... What do you say?

zhenhuaw-me · 2019-11-12T05:48:19Z

python/tvm/relay/qnn/op/legalizations.py

+@qnn_conv2d_legalize.register('arm_cpu')
+def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types):
+    # ARM prefers the dtypes to be same.
+    if is_fast_int8_hw_present():


Similar to Intel CPU.

anijain2305 · 2019-11-12T17:38:27Z

@jackwish Let me know if you have more comments.

@zhiics Can you please review as well?

FrozenGene · 2019-11-13T06:13:10Z

python/tvm/relay/qnn/op/legalizations.py

+    new_attrs['input_zero_point'] = input_zp
+    return relay_op(data, kernel, **new_attrs)
+
+def is_fast_int8_hw_present():


Maybe we could break into isolated function for Intel and ARM, which will make code cleaner, for example, we have PowerPC support in the future, I would like to have one isolated function ppc_int8_hw_support. However, current way is acceptable too.

zhenhuaw-me

Thanks for ping @anijain2305 , several minor comments :)

I didn't check the test detail, but if they are not strong enough?

zhenhuaw-me · 2019-11-13T06:09:27Z

python/tvm/relay/qnn/op/legalizations.py

+@qnn_conv2d_legalize.register('cpu')
+def _qnn_conv2d_legalize_intel_cpu(attrs, inputs, types):
+    # The VNNI transformations prefer uint8 x int8 datatypes.
+    if is_fast_int8_hw_present():


Thanks for the reply. Yes, I have seen it been used by dense and conv2d. I mean if we can split it per different targets rather than merge arm and x86 into one. It could be a bit weird to run target dependent logic when we are already know the targets, though the code here guarantees the correctness... What do you say?

zhenhuaw-me · 2019-11-13T06:11:46Z

python/tvm/relay/qnn/op/legalizations.py

+    is_present_arm = False
+    for opt in target.options:
+        if arm_supported_attr in opt:
+            is_present_arm = True


A break here may help :)

Or rewrite this to be something like

is_present_arm = '+v8.2a,+dotprod' in ' '.join(target.options)`

zhenhuaw-me · 2019-11-13T06:14:28Z

python/tvm/relay/qnn/op/legalizations.py

+    new_attrs['input_zero_point'] = input_zp
+    return relay_op(data, kernel, **new_attrs)
+
+def is_fast_int8_hw_present():


In https://github.com/apache/incubator-tvm/pull/4307/files#r345009110, I mean rewrite this function to be something like is_fast_int8_on_arm and is_fast_int8_on_x86 or something. Or maybe

def is_fast_int8_hw_present(key): if key == 'arm': # something elif key == 'x86': # some other else: # fall through

to unify the check.

zhenhuaw-me · 2019-11-13T06:19:46Z

python/tvm/relay/qnn/op/legalizations.py

+    assert 'int8' in data_dtype and 'int8' in kernel_dtype, \
+            "Qnn Conv2D only accepts uint8 or int8 inputs"


Is this assertion consistent with its description?

zhenhuaw-me · 2019-11-13T06:22:31Z

python/tvm/relay/qnn/op/legalizations.py

+    """
+
+    def _shift(data, out_dtype):
+        """Shifts (add/subtracts) the qnn tensor with +/-128)"""


I guess this is add or subtract with 128 :)

python/tvm/relay/qnn/op/legalizations.py

zhenhuaw-me · 2019-11-13T06:30:11Z

python/tvm/relay/qnn/op/legalizations.py

+    input_zp = attrs['input_zero_point']
+    data = _shift(data, kernel_dtype)
+    if data_dtype == 'int8':
+        input_zp = input_zp + 128
+    elif data_dtype == 'uint8':
+        input_zp = input_zp - 128
+    else:
+        raise RuntimeError("Qnn Conv2D only accepts uint8 or int8 inputs")


What about rewrite _shift() such that it also get the zero point shifting done?

…etic units.

anijain2305 · 2019-11-13T07:25:47Z

Addressed the comments :)

zhenhuaw-me

LGTM. Thank you!

zhiics

Overall LGTM.

I will merge since everyone has approved.

ajtulloch · 2019-11-14T02:37:46Z

Great stuff.

…etic units. (apache#4307)

* [TOPI][OP] Support Faster-RCNN Proposal OP on CPU (apache#4297) * Support Proposal operator on CPU. * PyLint space issue * PyLint space issue * Pylint singleton-comparison issue * [QNN][Legalize] Specialize for Platforms without any fast Int8 arithmetic units. (apache#4307) * fix error when memory_id is VTA_MEM_ID_OUT (apache#4330) * [CI][DOCKER] Add ONNX runtime dep (apache#4314) * [DOCKER] Add ONNX runtime dep * Improve ci script * [QNN] Quantize - Fixing the sequence of lowering. (apache#4316) * [QNN] Use Int16 upcast in Fallback Conv2D. Fix test names. (apache#4329) * [doc][fix] fix sphinx parsing for pass infra tutorial (apache#4337) * change ci image version (apache#4313) * [Codegen] remove fp16 function override for cuda (apache#4331) * add volatile override back * [codegen] remove fp16 function override for cuda * [CI] Set workspace to be per executor (apache#4336) * [Build][Windows] Fix Windows build by including cctype (apache#4319) * Fix build * dummy change to retrigger CI * dummy change to retrigger ci * dummy change to retrigger ci * Enable hipModuleGetGlobal() (apache#4321) * [Relay][Pass] Add pass to remove unused functions in relay module (apache#4334) * [Relay][Pass] Add pass to remove unused functions in relay module * Add tests * Fix lint * Fix visit order * Add pass argument * Fix * Add support for quant. mul operator in tflite frontend (apache#4283) A test for qnn_mul has to be added when the qnn elemwise tests (apache#4282) get merged. * Add topi.nn.fifo_buffer to TVM doc (apache#4343) * Solve custom model of prelu (apache#4326) * Deprecate NNVM warning msg (apache#4333) * [Contrib] Add MKL DNN option (apache#4323) * [Contrib] Add MKL DNN * update * update * [Relay][Frontend][TF] Fix transpose when axes is not a param (apache#4327) * [Relay][Frontend][TF] Use _infer_value_simulated when axes is not a const to Transpose * uncomment tests * dummy change to retrigger ci * [RUNTIME] Add device query for AMD GcnArch (apache#4341) * add gcnArch query * kGcnArch query for cuda is a no-op * [Test][Relay][Pass] Add test case for lambda lift (apache#4317) * [Relay][Frontend][ONNX] operator support: DepthToSpace, SpaceToDepth (apache#4271) * imp module is deprecated (apache#4275) * [VTA] Bug fix for padded load with large inputs (apache#4293) * bug fix for padded load with large inputs * Update TensorLoad.scala * Update test_vta_insn.py * fix inconsistent tag name (apache#4134) * [CodeGen] Add build config option disable_assert to control whether to generate assert (apache#4340) * Bump up CUDA log version in tophub.py (apache#4347) * Add check to ensure input file was successfully opened in NNVM deploy code demo (apache#4315) * [COMMUNITY] Add DISCLAIMER, KEYS for ASF release (apache#4345) * [COMMUNITY] Add DISCLAIMER, KEYS for ASF release * Add file name spec * [Relay][VM][Interpreter] Enable first-class constructors in VM and interpreter via eta expansion (apache#4218) * Fix constructor pretty printing * Make Module::HasDef name consistent with API * Add VM constructor compilation via eta expansion * Lint * Fix CI * Fix failing test * Address comment * Retrigger CI * Retrigger CI * Update dmlc_tvm_commit_id.txt

anijain2305 force-pushed the qnn_arm_int8 branch 2 times, most recently from 7025a08 to 5014e6f Compare November 11, 2019 20:42

zhenhuaw-me reviewed Nov 12, 2019

View reviewed changes

anijain2305 force-pushed the qnn_arm_int8 branch from 5014e6f to f3069b1 Compare November 12, 2019 07:37

FrozenGene reviewed Nov 13, 2019

View reviewed changes

zhenhuaw-me suggested changes Nov 13, 2019

View reviewed changes

anijain2305 force-pushed the qnn_arm_int8 branch from f3069b1 to 7ce3287 Compare November 13, 2019 07:15

[QNN][Legalize] Specialize for Platforms without any fast Int8 arithm…

9fe7acf

…etic units.

anijain2305 force-pushed the qnn_arm_int8 branch from 7ce3287 to 9fe7acf Compare November 13, 2019 07:15

zhenhuaw-me approved these changes Nov 13, 2019

View reviewed changes

FrozenGene approved these changes Nov 13, 2019

View reviewed changes

zhiics approved these changes Nov 13, 2019

View reviewed changes

zhiics merged commit 3486e2c into apache:master Nov 13, 2019

zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Nov 15, 2019

[QNN][Legalize] Specialize for Platforms without any fast Int8 arithm…

7021a21

…etic units. (apache#4307)

zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Nov 15, 2019

[QNN][Legalize] Specialize for Platforms without any fast Int8 arithm…

d6cf282

…etic units. (apache#4307)

tqchen mentioned this pull request Nov 16, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN][Legalize] Specialize for Platforms w/o fast Int8 support #4307

[QNN][Legalize] Specialize for Platforms w/o fast Int8 support #4307

anijain2305 commented Nov 11, 2019 •

edited

Loading

zhenhuaw-me Nov 12, 2019

anijain2305 Nov 12, 2019

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 12, 2019

anijain2305 commented Nov 12, 2019

FrozenGene Nov 13, 2019

zhenhuaw-me left a comment

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 13, 2019

zhenhuaw-me Nov 13, 2019

anijain2305 commented Nov 13, 2019

zhenhuaw-me left a comment

zhiics left a comment

ajtulloch commented Nov 14, 2019

		assert 'int8' in data_dtype and 'int8' in kernel_dtype, \
		"Qnn Conv2D only accepts uint8 or int8 inputs"

[QNN][Legalize] Specialize for Platforms w/o fast Int8 support #4307

[QNN][Legalize] Specialize for Platforms w/o fast Int8 support #4307

Conversation

anijain2305 commented Nov 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Nov 12, 2019

Choose a reason for hiding this comment

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Nov 13, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

zhiics left a comment

Choose a reason for hiding this comment

ajtulloch commented Nov 14, 2019

anijain2305 commented Nov 11, 2019 •

edited

Loading