[QNN] Legalization for Intel x86 QNN Conv2D #3896

anijain2305 · 2019-09-05T00:54:46Z

Intel x86 has fast Int8 instructions for u8 x i8 conv2D. The frameworks might have different dtypes. This PR write QNN legalizations for QNN Conv2D for Intel x86 to go to u8 x i8.

anijain2305 · 2019-09-05T00:55:34Z

@shoubhik @zhiics @jackwish @vinx13 Please review.

zhenhuaw-me

Personally, I am not sure about this.

VNNI is supported by the latest Intel processors only, is it fine to let x86 handles uint8 * int8 by default and via legalize? Or to put it in TOPI with dedicated routine which to be enabled on condition - say, templates. Also, AFAIK, VNNI intends a symmetric approach.

zhenhuaw-me · 2019-09-05T03:12:06Z

python/tvm/relay/qnn/op/legalizations.py

+      scale * ( (QA + 128) - (zp_a + 128))
+
+    Replacing QA + 128 with QA' and (zp_a + 128) with zp_a'
+    We get our new uint8 tensor - scale * (QA' - zp_a')


new quantized tensor?

python/tvm/relay/qnn/op/legalizations.py

zhenhuaw-me · 2019-09-05T03:20:43Z

python/tvm/relay/qnn/op/legalizations.py

+        """Shifts (add/subtracts) the qnn tensor with +/-128)"""
+        data_modified = relay.cast(data, 'int32')
+        data_modified = relay.add(data_modified, relay.const(shift, 'int32'))
+        data_modified = relay.clip(data_modified,


I guess clip is not needed.

If we just cast it back, any value that is higher than the max of out_dtype will be wrapped around. So, it is safe to clip it first and then cast.

It appears to me that, subtracting uint8 by 128 will get value range [-128, 127] which lies in int8 - won't overflow here I guess.

You are right about that.

anijain2305 · 2019-09-05T17:41:30Z

Personally, I am not sure about this.

VNNI is supported by the latest Intel processors only, is it fine to let x86 handles uint8 * int8 by default and via legalize? Or to put it in TOPI with dedicated routine which to be enabled on condition - say, templates. Also, AFAIK, VNNI intends a symmetric approach.

Thanks for the comments @jackwish
Let me first give little background why are we doing this

Intel Skylake and newer processors have Int8 fast instructions. But they only support u8 x i8. One can legalize the nn.conv2d when it sees u8 x i8 and convert it to u8 x i8. However, doing that at nn.conv2d involves many more instructions. (It requires instructions before and after conv). Therefore, I decided to do it at qnn.conv2d. In this case, we can just worry about the quantized tensors in general as we have information about scales and zero points. Here, we just have to requantize the weight to go from uint8 to int8 and everything is set.
VNNI supports symmetric design. Yes, that is true. This PR does not change that. The whole flow would be - we have u8 x u8 qnn.conv2d. QnnLegalize converts it to u8 x i8 qnn.conv2d. QnnCanonicalize will lower the qnn.conv2d into Relay ops. The example is

# Original
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  qnn.conv2d(%data, %kernel, kernel_size=[3, 3], out_dtype="int32", input_zero_point=1, kernel_zero_point=1) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

# After QnnLegalize
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  %0 = cast(%kernel, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %1 = add(%0, -128 /* ty=int32 */) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %2 = clip(%1, a_min=-128f, a_max=127f) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %3 = cast(%2, dtype="int8") /* ty=Tensor[(128, 64, 3, 3), int8] */;
  qnn.conv2d(%data, %3, kernel_size=[3, 3], out_dtype="int32", input_zero_point=1, kernel_zero_point=-127) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

# After QnnCanonicalize
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  %0 = cast(%kernel, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %1 = add(%0, -128 /* ty=int32 */) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %2 = clip(%1, a_min=-128f, a_max=127f) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %3 = cast(%2, dtype="int8") /* ty=Tensor[(128, 64, 3, 3), int8] */;
  %4 = nn.conv2d(%data, %3, kernel_size=[3, 3], out_dtype="int32") /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %5 = cast(%data, dtype="int32") /* ty=Tensor[(1, 64, 256, 256), int32] */;
  %6 = multiply(%5, 9 /* ty=int32 */) /* ty=Tensor[(1, 64, 256, 256), int32] */;
  %7 = nn.avg_pool2d(%6, pool_size=[3, 3]) /* ty=Tensor[(1, 64, 254, 254), int32] */;
  %8 = sum(%7, axis=[1], keepdims=True) /* ty=Tensor[(1, 1, 254, 254), int32] */;
  %9 = multiply(-127 /* ty=int32 */, %8) /* ty=Tensor[(1, 1, 254, 254), int32] */;
  %10 = tile(%9, meta[relay.attrs.TileAttrs][0]) /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %11 = subtract(%4, %10) /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %12 = cast(%3, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %13 = sum(%12, axis=[1, 2, 3]) /* ty=Tensor[(128), int32] */;
  %14 = reshape(%13, newshape=[1, 128, 1, 1]) /* ty=Tensor[(1, 128, 1, 1), int32] */;
  %15 = subtract(-73152 /* ty=int32 */, %14) /* ty=Tensor[(1, 128, 1, 1), int32] */;
  add(%11, %15) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

In this manner, the nn.conv2d gets the u8 x i8 inputs. Now, we can use Legalize on nn.conv2d to use the TOPI templates as you were suggesting.

Finally, I can add a guard where if the target is Skylake+, only then I will trigger this legalization. That should resolve the concern what happens with older processors

What do you think?

zhenhuaw-me · 2019-09-06T07:18:20Z

Thank you for the very detailed background explanation @anijain2305 .
One thing that I am not fully understand is, with this patch, uint8 data and int8 kernel which feeds to nn.conv2d are asymmetric, how they should be computed by VNNI extension regarding the accumulated multiplication part?

anijain2305 · 2019-09-06T17:59:26Z

Thank you for the very detailed background explanation @anijain2305 .
One thing that I am not fully understand is, with this patch, uint8 data and int8 kernel which feeds to nn.conv2d are asymmetric, how they should be computed by VNNI extension regarding the accumulated multiplication part?

By the time, the data and kernel comes to conv, they are just uint8 and int8 numbers. The HW has no notion of asymmetric and symmetric. It will just take two tensors, one uint8 and other int8 and do dot product. So, I don't think asymmetry plays a role at the level of nn.conv2d.

Side note - If suppose both the inputs were symmetric. The qnn.conv2d will Canonicalize to just nn.conv2d. In that case also, nn,.conv2d will only have two tensors, one uint8 and other int8. Whether they were symmetric or not, nn.conv2d (and also VNNI) is completely unaware of that.

Please let me know if I am missing anything.

anijain2305 · 2019-09-09T16:55:47Z

@jackwish Please let me know if you have more questions. Or if something did not make sense. I am happy to provide more description.

zhenhuaw-me · 2019-09-10T02:16:10Z

Hi @anijain2305 , I have been busy with some other stuff, sorry for the delayed reply.

I think the int8 (weight) should be requantized to symmetric representation rather than adding the zero point, because the value density of uint8 and int8 representation is different. Therefore, the uint8 and int8 VNNI arithmetic makes sense. And, the bias may need to shift (I mean value shift, not the bit shift) to cooperate with this VNNI design. I may propose some equations if this doesn't seem very straightforward.

The HW has no notion of asymmetric and symmetric.

Absolutely yes, that's what the system developer should be carefull :)

And, sorry if I have any misunderstanding :)

anijain2305 · 2019-09-10T05:30:35Z

@jackwish I think I understand what you are referring to, but I am talking about a different abstraction. Allow me to explain. Let's leave asymmetric, symmetric and VNNI off the discussion for now.

Suppose, we have a tensor of floating point numbers. We can represent them in one quantized representation Q_a with scale_a and zp_a. So,

FP_value = scale_a * (Q_a - zp_a)

Now, I can have another quantized representation that can map to the exact same floating point numbers as that Q_a tensor.

For example

FP_value = scale_a * ( Q_a - zp_a + 128 - 128)
FP_value = scale_a * ( (Q_a - 128) - (zp_a - 128))

So, here, we can have another quantized representation Q_b with (scale_a, `zp_b'), where

Q_b = Q_z - 128
zp_b = zp_a - 128

Any operator that was consuming this tensor as input remains unchanged. It will still get same represented floating point values (only the quantized values and zero point has changed, while the floating point values are still same).

In the context of PR, Q_a is u8 and I want to bring all the values in i8 range. This can be done by using the above equation.

Essentially, following thing happens

# Original

        u8       u8
        |        |
        qnn.conv2d

# After QnnLegalize


       u8       u8
        |        |
        |        requantize to i8 using above equations
        |        |
        qnn.conv2d

If we agree on this, the following comment should also help - https://github.com/dmlc/tvm/blob/42195a48e01c4850a7a89d2ae586740821d76555/python/tvm/relay/qnn/transform.py#L71-L107

zhenhuaw-me · 2019-09-10T06:29:47Z

Yes, you are right @anijain2305, I was focus on factorizing uint8 * int8 too much, leading to misunderstanding of the legalization.

Sorry for the that, and thank you for the kind patient and detailed explain :)

anijain2305 · 2019-09-10T16:07:40Z

No worries @jackwish
It is my pleasure. These discussions are very useful :)

anijain2305 · 2019-09-12T05:49:24Z

@jackwish Can you please review again?

zhenhuaw-me

Can we add target check (to enable this legalization) directly in this PR, or another one is needed?

python/tvm/relay/qnn/op/legalizations.py

anijain2305 · 2019-09-12T17:09:50Z

@jackwish @zhiics Can you please review?

anijain2305 · 2019-09-15T08:16:48Z

@jackwish @zhiics Can you please review? Lets try to get this in.

zhenhuaw-me

LGTM :)

I added some code-style like comments, which is very personalized and won't block the merge :)

ps. I was on travelling, sorry for the delayed response.

python/tvm/relay/qnn/op/legalizations.py

anijain2305 · 2019-09-16T06:03:55Z

Thanks @jackwish I have update accordingly
@zhiics Please review.

zhiics

LGTM

* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D

QNNLegalize for conv2d

a25bed2

anijain2305 force-pushed the qnn_lower branch 2 times, most recently from 63d021a to 12c6e3d Compare September 5, 2019 01:00

zhenhuaw-me suggested changes Sep 5, 2019

View reviewed changes

anijain2305 force-pushed the qnn_lower branch from 12c6e3d to 2cc8914 Compare September 12, 2019 05:48

zhenhuaw-me reviewed Sep 12, 2019

View reviewed changes

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved

anijain2305 force-pushed the qnn_lower branch from 2cc8914 to 54b9f1b Compare September 12, 2019 17:09

anijain2305 force-pushed the qnn_lower branch 2 times, most recently from 4644735 to 26d1749 Compare September 15, 2019 06:11

zhenhuaw-me approved these changes Sep 16, 2019

View reviewed changes

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved

[QNN] Legalization for Intel x86 QNN Conv2D

bcff1dc

anijain2305 force-pushed the qnn_lower branch from 26d1749 to bcff1dc Compare September 16, 2019 06:02

zhiics approved these changes Sep 16, 2019

View reviewed changes

zhiics merged commit 26eaea4 into apache:master Sep 16, 2019

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 30, 2019

[QNN] Legalization for Intel x86 QNN Conv2D (apache#3896)

ed18e09

* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 30, 2019

[QNN] Legalization for Intel x86 QNN Conv2D (apache#3896)

3b4dd33

* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D

wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 1, 2019

[QNN] Legalization for Intel x86 QNN Conv2D (apache#3896)

431245e

* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

anijain2305 deleted the qnn_lower branch November 13, 2019 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN] Legalization for Intel x86 QNN Conv2D #3896

[QNN] Legalization for Intel x86 QNN Conv2D #3896

anijain2305 commented Sep 5, 2019

anijain2305 commented Sep 5, 2019 •

edited

Loading

zhenhuaw-me left a comment

zhenhuaw-me Sep 5, 2019

anijain2305 Sep 5, 2019

zhenhuaw-me Sep 5, 2019

anijain2305 Sep 5, 2019

zhenhuaw-me Sep 10, 2019

anijain2305 Sep 12, 2019

anijain2305 commented Sep 5, 2019 •

edited

Loading

zhenhuaw-me commented Sep 6, 2019

anijain2305 commented Sep 6, 2019

anijain2305 commented Sep 9, 2019

zhenhuaw-me commented Sep 10, 2019

anijain2305 commented Sep 10, 2019 •

edited

Loading

zhenhuaw-me commented Sep 10, 2019

anijain2305 commented Sep 10, 2019

anijain2305 commented Sep 12, 2019

zhenhuaw-me left a comment

anijain2305 commented Sep 12, 2019

anijain2305 commented Sep 15, 2019

zhenhuaw-me left a comment •

edited

Loading

anijain2305 commented Sep 16, 2019

zhiics left a comment

[QNN] Legalization for Intel x86 QNN Conv2D #3896

[QNN] Legalization for Intel x86 QNN Conv2D #3896

Conversation

anijain2305 commented Sep 5, 2019

anijain2305 commented Sep 5, 2019 • edited Loading

zhenhuaw-me left a comment

Choose a reason for hiding this comment

zhenhuaw-me Sep 5, 2019

Choose a reason for hiding this comment

anijain2305 Sep 5, 2019

Choose a reason for hiding this comment

zhenhuaw-me Sep 5, 2019

Choose a reason for hiding this comment

anijain2305 Sep 5, 2019

Choose a reason for hiding this comment

zhenhuaw-me Sep 10, 2019

Choose a reason for hiding this comment

anijain2305 Sep 12, 2019

Choose a reason for hiding this comment

anijain2305 commented Sep 5, 2019 • edited Loading

zhenhuaw-me commented Sep 6, 2019

anijain2305 commented Sep 6, 2019

anijain2305 commented Sep 9, 2019

zhenhuaw-me commented Sep 10, 2019

anijain2305 commented Sep 10, 2019 • edited Loading

zhenhuaw-me commented Sep 10, 2019

anijain2305 commented Sep 10, 2019

anijain2305 commented Sep 12, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

anijain2305 commented Sep 12, 2019

anijain2305 commented Sep 15, 2019

zhenhuaw-me left a comment • edited Loading

Choose a reason for hiding this comment

anijain2305 commented Sep 16, 2019

zhiics left a comment

Choose a reason for hiding this comment

anijain2305 commented Sep 5, 2019 •

edited

Loading

anijain2305 commented Sep 5, 2019 •

edited

Loading

anijain2305 commented Sep 10, 2019 •

edited

Loading

zhenhuaw-me left a comment •

edited

Loading