-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Legalization for Intel x86 QNN Conv2D #3896
Conversation
63d021a
to
12c6e3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am not sure about this.
VNNI is supported by the latest Intel processors only, is it fine to let x86 handles uint8
* int8
by default and via legalize? Or to put it in TOPI with dedicated routine which to be enabled on condition - say, templates. Also, AFAIK, VNNI intends a symmetric approach.
scale * ( (QA + 128) - (zp_a + 128)) | ||
|
||
Replacing QA + 128 with QA' and (zp_a + 128) with zp_a' | ||
We get our new uint8 tensor - scale * (QA' - zp_a') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new quantized tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
"""Shifts (add/subtracts) the qnn tensor with +/-128)""" | ||
data_modified = relay.cast(data, 'int32') | ||
data_modified = relay.add(data_modified, relay.const(shift, 'int32')) | ||
data_modified = relay.clip(data_modified, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess clip
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just cast it back, any value that is higher than the max of out_dtype will be wrapped around. So, it is safe to clip it first and then cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to me that, subtracting uint8
by 128
will get value range [-128, 127]
which lies in int8
- won't overflow here I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about that.
Thanks for the comments @jackwish
In this manner, the
What do you think? |
Thank you for the very detailed background explanation @anijain2305 . |
By the time, the data and kernel comes to conv, they are just uint8 and int8 numbers. The HW has no notion of asymmetric and symmetric. It will just take two tensors, one uint8 and other int8 and do dot product. So, I don't think asymmetry plays a role at the level of Side note - If suppose both the inputs were symmetric. The Please let me know if I am missing anything. |
@jackwish Please let me know if you have more questions. Or if something did not make sense. I am happy to provide more description. |
Hi @anijain2305 , I have been busy with some other stuff, sorry for the delayed reply. I think the
Absolutely yes, that's what the system developer should be carefull :) And, sorry if I have any misunderstanding :) |
@jackwish I think I understand what you are referring to, but I am talking about a different abstraction. Allow me to explain. Let's leave asymmetric, symmetric and VNNI off the discussion for now. Suppose, we have a tensor of floating point numbers. We can represent them in one quantized representation
Now, I can have another quantized representation that can map to the exact same floating point numbers as that For example
So, here, we can have another quantized representation
Any operator that was consuming this tensor as input remains unchanged. It will still get same represented floating point values (only the quantized values and zero point has changed, while the floating point values are still same). In the context of PR, Q_a is Essentially, following thing happens
If we agree on this, the following comment should also help - https://github.com/dmlc/tvm/blob/42195a48e01c4850a7a89d2ae586740821d76555/python/tvm/relay/qnn/transform.py#L71-L107 |
Yes, you are right @anijain2305, I was focus on factorizing Sorry for the that, and thank you for the kind patient and detailed explain :) |
No worries @jackwish |
12c6e3d
to
2cc8914
Compare
@jackwish Can you please review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add target check (to enable this legalization) directly in this PR, or another one is needed?
2cc8914
to
54b9f1b
Compare
4644735
to
26d1749
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
I added some code-style like comments, which is very personalized and won't block the merge :)
ps. I was on travelling, sorry for the delayed response.
26d1749
to
bcff1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D
* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D
* QNNLegalize for conv2d * [QNN] Legalization for Intel x86 QNN Conv2D
Intel x86 has fast Int8 instructions for u8 x i8 conv2D. The frameworks might have different dtypes. This PR write QNN legalizations for QNN Conv2D for Intel x86 to go to u8 x i8.