Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN] Legalization for Intel x86 QNN Conv2D #3896

Merged
merged 2 commits into from
Sep 16, 2019

Conversation

anijain2305
Copy link
Contributor

Intel x86 has fast Int8 instructions for u8 x i8 conv2D. The frameworks might have different dtypes. This PR write QNN legalizations for QNN Conv2D for Intel x86 to go to u8 x i8.

@anijain2305
Copy link
Contributor Author

anijain2305 commented Sep 5, 2019

@shoubhik @zhiics @jackwish @vinx13 Please review.

@anijain2305 anijain2305 force-pushed the qnn_lower branch 2 times, most recently from 63d021a to 12c6e3d Compare September 5, 2019 01:00
Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am not sure about this.

VNNI is supported by the latest Intel processors only, is it fine to let x86 handles uint8 * int8 by default and via legalize? Or to put it in TOPI with dedicated routine which to be enabled on condition - say, templates. Also, AFAIK, VNNI intends a symmetric approach.

scale * ( (QA + 128) - (zp_a + 128))

Replacing QA + 128 with QA' and (zp_a + 128) with zp_a'
We get our new uint8 tensor - scale * (QA' - zp_a')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new quantized tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved
"""Shifts (add/subtracts) the qnn tensor with +/-128)"""
data_modified = relay.cast(data, 'int32')
data_modified = relay.add(data_modified, relay.const(shift, 'int32'))
data_modified = relay.clip(data_modified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess clip is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just cast it back, any value that is higher than the max of out_dtype will be wrapped around. So, it is safe to clip it first and then cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to me that, subtracting uint8 by 128 will get value range [-128, 127] which lies in int8 - won't overflow here I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about that.

@anijain2305
Copy link
Contributor Author

anijain2305 commented Sep 5, 2019

Personally, I am not sure about this.

VNNI is supported by the latest Intel processors only, is it fine to let x86 handles uint8 * int8 by default and via legalize? Or to put it in TOPI with dedicated routine which to be enabled on condition - say, templates. Also, AFAIK, VNNI intends a symmetric approach.

Thanks for the comments @jackwish
Let me first give little background why are we doing this

  • Intel Skylake and newer processors have Int8 fast instructions. But they only support u8 x i8. One can legalize the nn.conv2d when it sees u8 x i8 and convert it to u8 x i8. However, doing that at nn.conv2d involves many more instructions. (It requires instructions before and after conv). Therefore, I decided to do it at qnn.conv2d. In this case, we can just worry about the quantized tensors in general as we have information about scales and zero points. Here, we just have to requantize the weight to go from uint8 to int8 and everything is set.

  • VNNI supports symmetric design. Yes, that is true. This PR does not change that. The whole flow would be - we have u8 x u8 qnn.conv2d. QnnLegalize converts it to u8 x i8 qnn.conv2d. QnnCanonicalize will lower the qnn.conv2d into Relay ops. The example is

# Original
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  qnn.conv2d(%data, %kernel, kernel_size=[3, 3], out_dtype="int32", input_zero_point=1, kernel_zero_point=1) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

# After QnnLegalize
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  %0 = cast(%kernel, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %1 = add(%0, -128 /* ty=int32 */) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %2 = clip(%1, a_min=-128f, a_max=127f) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %3 = cast(%2, dtype="int8") /* ty=Tensor[(128, 64, 3, 3), int8] */;
  qnn.conv2d(%data, %3, kernel_size=[3, 3], out_dtype="int32", input_zero_point=1, kernel_zero_point=-127) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

# After QnnCanonicalize
v0.0.3
def @main(%data: Tensor[(1, 64, 256, 256), uint8], %kernel: Tensor[(128, 64, 3, 3), uint8]) -> Tensor[(1, 128, 254, 254), int32] {
  %0 = cast(%kernel, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %1 = add(%0, -128 /* ty=int32 */) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %2 = clip(%1, a_min=-128f, a_max=127f) /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %3 = cast(%2, dtype="int8") /* ty=Tensor[(128, 64, 3, 3), int8] */;
  %4 = nn.conv2d(%data, %3, kernel_size=[3, 3], out_dtype="int32") /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %5 = cast(%data, dtype="int32") /* ty=Tensor[(1, 64, 256, 256), int32] */;
  %6 = multiply(%5, 9 /* ty=int32 */) /* ty=Tensor[(1, 64, 256, 256), int32] */;
  %7 = nn.avg_pool2d(%6, pool_size=[3, 3]) /* ty=Tensor[(1, 64, 254, 254), int32] */;
  %8 = sum(%7, axis=[1], keepdims=True) /* ty=Tensor[(1, 1, 254, 254), int32] */;
  %9 = multiply(-127 /* ty=int32 */, %8) /* ty=Tensor[(1, 1, 254, 254), int32] */;
  %10 = tile(%9, meta[relay.attrs.TileAttrs][0]) /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %11 = subtract(%4, %10) /* ty=Tensor[(1, 128, 254, 254), int32] */;
  %12 = cast(%3, dtype="int32") /* ty=Tensor[(128, 64, 3, 3), int32] */;
  %13 = sum(%12, axis=[1, 2, 3]) /* ty=Tensor[(128), int32] */;
  %14 = reshape(%13, newshape=[1, 128, 1, 1]) /* ty=Tensor[(1, 128, 1, 1), int32] */;
  %15 = subtract(-73152 /* ty=int32 */, %14) /* ty=Tensor[(1, 128, 1, 1), int32] */;
  add(%11, %15) /* ty=Tensor[(1, 128, 254, 254), int32] */
}

In this manner, the nn.conv2d gets the u8 x i8 inputs. Now, we can use Legalize on nn.conv2d to use the TOPI templates as you were suggesting.

  • Finally, I can add a guard where if the target is Skylake+, only then I will trigger this legalization. That should resolve the concern what happens with older processors

What do you think?

@zhenhuaw-me
Copy link
Contributor

Thank you for the very detailed background explanation @anijain2305 .
One thing that I am not fully understand is, with this patch, uint8 data and int8 kernel which feeds to nn.conv2d are asymmetric, how they should be computed by VNNI extension regarding the accumulated multiplication part?

@anijain2305
Copy link
Contributor Author

Thank you for the very detailed background explanation @anijain2305 .
One thing that I am not fully understand is, with this patch, uint8 data and int8 kernel which feeds to nn.conv2d are asymmetric, how they should be computed by VNNI extension regarding the accumulated multiplication part?

By the time, the data and kernel comes to conv, they are just uint8 and int8 numbers. The HW has no notion of asymmetric and symmetric. It will just take two tensors, one uint8 and other int8 and do dot product. So, I don't think asymmetry plays a role at the level of nn.conv2d.

Side note - If suppose both the inputs were symmetric. The qnn.conv2d will Canonicalize to just nn.conv2d. In that case also, nn,.conv2d will only have two tensors, one uint8 and other int8. Whether they were symmetric or not, nn.conv2d (and also VNNI) is completely unaware of that.

Please let me know if I am missing anything.

@anijain2305
Copy link
Contributor Author

@jackwish Please let me know if you have more questions. Or if something did not make sense. I am happy to provide more description.

@zhenhuaw-me
Copy link
Contributor

Hi @anijain2305 , I have been busy with some other stuff, sorry for the delayed reply.

I think the int8 (weight) should be requantized to symmetric representation rather than adding the zero point, because the value density of uint8 and int8 representation is different. Therefore, the uint8 and int8 VNNI arithmetic makes sense. And, the bias may need to shift (I mean value shift, not the bit shift) to cooperate with this VNNI design. I may propose some equations if this doesn't seem very straightforward.

The HW has no notion of asymmetric and symmetric.

Absolutely yes, that's what the system developer should be carefull :)

And, sorry if I have any misunderstanding :)

@anijain2305
Copy link
Contributor Author

anijain2305 commented Sep 10, 2019

@jackwish I think I understand what you are referring to, but I am talking about a different abstraction. Allow me to explain. Let's leave asymmetric, symmetric and VNNI off the discussion for now.

Suppose, we have a tensor of floating point numbers. We can represent them in one quantized representation Q_a with scale_a and zp_a. So,

FP_value = scale_a * (Q_a - zp_a)

Now, I can have another quantized representation that can map to the exact same floating point numbers as that Q_a tensor.

For example

FP_value = scale_a * ( Q_a - zp_a + 128 - 128)
FP_value = scale_a * ( (Q_a - 128) - (zp_a - 128))

So, here, we can have another quantized representation Q_b with (scale_a, `zp_b'), where

Q_b = Q_z - 128
zp_b = zp_a - 128

Any operator that was consuming this tensor as input remains unchanged. It will still get same represented floating point values (only the quantized values and zero point has changed, while the floating point values are still same).

In the context of PR, Q_a is u8 and I want to bring all the values in i8 range. This can be done by using the above equation.

Essentially, following thing happens

# Original

        u8       u8
        |        |
        qnn.conv2d

# After QnnLegalize


       u8       u8
        |        |
        |        requantize to i8 using above equations
        |        |
        qnn.conv2d

If we agree on this, the following comment should also help - https://github.com/dmlc/tvm/blob/42195a48e01c4850a7a89d2ae586740821d76555/python/tvm/relay/qnn/transform.py#L71-L107

@zhenhuaw-me
Copy link
Contributor

Yes, you are right @anijain2305, I was focus on factorizing uint8 * int8 too much, leading to misunderstanding of the legalization.

Sorry for the that, and thank you for the kind patient and detailed explain :)

@anijain2305
Copy link
Contributor Author

No worries @jackwish
It is my pleasure. These discussions are very useful :)

@anijain2305
Copy link
Contributor Author

@jackwish Can you please review again?

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add target check (to enable this legalization) directly in this PR, or another one is needed?

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved
@anijain2305
Copy link
Contributor Author

@jackwish @zhiics Can you please review?

@anijain2305 anijain2305 force-pushed the qnn_lower branch 2 times, most recently from 4644735 to 26d1749 Compare September 15, 2019 06:11
@anijain2305
Copy link
Contributor Author

@jackwish @zhiics Can you please review? Lets try to get this in.

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

I added some code-style like comments, which is very personalized and won't block the merge :)

ps. I was on travelling, sorry for the delayed response.

python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved
python/tvm/relay/qnn/op/legalizations.py Show resolved Hide resolved
@anijain2305
Copy link
Contributor Author

Thanks @jackwish I have update accordingly
@zhiics Please review.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiics zhiics merged commit 26eaea4 into apache:master Sep 16, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 30, 2019
* QNNLegalize for conv2d

* [QNN] Legalization for Intel x86 QNN Conv2D
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 30, 2019
* QNNLegalize for conv2d

* [QNN] Legalization for Intel x86 QNN Conv2D
wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 1, 2019
* QNNLegalize for conv2d

* [QNN] Legalization for Intel x86 QNN Conv2D
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants