[Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911

masahi · 2022-09-26T20:07:15Z

This PR adds TE compute and schedule definitions for int8 conv2d and dense using vrmpy tensorization, and Relay alter layout / legalize to enable using them in e2e settings. Since vrmpy is very similar to x86 VNNI or ARM sdot/udot instructions, lots of code are shared with existing x86 / ARM backend implementations.

This lets us run int8 resnet50 in 146 msec on SD888. All convolutions and the final dense op are tensorized. The current bottleneck is requantize-related operations. The test script and model files to run int8 resnet50 are attached below.

test_qresnet50.zip

@kparzysz-quic @tkonolige @nverke @ibsidorenko

masahi · 2022-09-27T05:54:04Z

python/tvm/topi/hexagon/conv2d_alter_op.py

+    Unlike the nn.dense case (see dense_alter_op.py), we do not convert (uint8, int8) to
+    (uint8, uint8). That would introduce another convolution by a constant (128 or 1) filter,
+    to compensate for the dtype legalization. In the nn.dense case, such compensation factor is
+    just a sum over the K axis.


cc @ibsidorenko @tkonolige @nverke on this. We can convert u8 * s8 convolution to u8 * u8 like below

W'_u8 = W_s8 + 128 X_u8 * W_s8 = X_u8 * (W'_u8 - 128) = X'_u8 * W'_u8 - X_u8 * 128

Here, X_u8 * 128 is a convolution of X_u8 by a constant filter. We can factor out 128 to end up with a filter where all elements are 1. So what we need is a windowed sum, or "sum pooling" op - without it, I think we need to do a full blown convolution. This is why I don't use legalization for conv2d. Let me know if you have better idea.

masahi · 2022-09-27T05:59:59Z

python/tvm/topi/hexagon/injective.py

-        _, inner = s[x].split(fused, factor=128 // np.dtype(x.dtype).itemsize)
+        outer, inner = s[x].split(fused, factor=128 // np.dtype(x.dtype).itemsize)
        s[x].vectorize(inner)
+        s[x].parallel(outer)


cc @kparzysz-quic @nverke, we are enabling multithreading of elemwise ops here. Multithreading on e2e models have been stable since #12807

ibsidorenko · 2022-09-29T07:10:41Z

LGTM!

tmoreau89 · 2022-10-03T21:40:52Z

Thanks @masahi @ibsidorenko @kparzysz-quic the PR has been merged

…che#12911) * [Hexagon] Support vrmpy tensorization for conv2d and dense schedules * update * clean up * migrate tests to test_launcher.py * remove vrmpy test files * use generic int8 conv2d schedule * clean up * doc update * pylint fix * parametrize dtype in test * doc update * add missing paralleization for dense * more pylint * fixed for fp32 dense

masahi added 8 commits September 27, 2022 04:03

[Hexagon] Support vrmpy tensorization for conv2d and dense schedules

e9d1cea

update

2cd5ae8

clean up

d8b6be7

migrate tests to test_launcher.py

21bcd55

remove vrmpy test files

39b17f6

use generic int8 conv2d schedule

e95daae

clean up

cabe37c

doc update

d253153

masahi force-pushed the hex-conv2d-dense-vrmpy branch from 5cc80ab to e972136 Compare September 27, 2022 03:14

pylint fix

17bde45

masahi force-pushed the hex-conv2d-dense-vrmpy branch from e972136 to 17bde45 Compare September 27, 2022 03:38

parametrize dtype in test

2537411

masahi marked this pull request as ready for review September 27, 2022 05:35

doc update

978158d

masahi commented Sep 27, 2022

View reviewed changes

add missing paralleization for dense

4ad3e63

masahi force-pushed the hex-conv2d-dense-vrmpy branch from 14e83d5 to 4ad3e63 Compare September 27, 2022 06:05

masahi added 2 commits September 27, 2022 15:15

more pylint

849ed3c

fixed for fp32 dense

2088965

kparzysz-quic approved these changes Sep 30, 2022

View reviewed changes

tmoreau89 merged commit f3d3ece into apache:main Oct 3, 2022

masahi mentioned this pull request Oct 17, 2022

[QNN][Hexagon] Disable QNN canonicalization pass #12398

Merged

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911

[Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911

Uh oh!

masahi commented Sep 26, 2022 •

edited

Loading

Uh oh!

masahi Sep 27, 2022

Uh oh!

masahi Sep 27, 2022

Uh oh!

ibsidorenko commented Sep 29, 2022

Uh oh!

tmoreau89 commented Oct 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911

[Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911

Uh oh!

Conversation

masahi commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masahi Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

masahi Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

ibsidorenko commented Sep 29, 2022

Uh oh!

tmoreau89 commented Oct 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

masahi commented Sep 26, 2022 •

edited

Loading