[TIR][Hexagon] Enhancement of NarrowDataType pass for binary ops #14298
+99
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is enhancement of PR#13327.
Motivation:
Playing with MetaScheduler for Hexagon target it was found that
avg_pool2dhas rather poor performance due to lack of vectorized code.IndexDataTypeNormalizerpass converts all indices to "int64" format andNarrowDataTypeRewritershould do the opposite (back to "int32"). In case of fail, we have a lot of int64 arithmetic for average pooling that can not be vectorized.What was done:
Added support of binary ops ("div", "max", "min", "+" etc.) in
NarrowDataTypeRewriter. In case of different bitwidth of operands in binary opeation it does downcasting instead of upcasting (as it was before).Performance impact:
avg_pool2dfrom quantized InceptionV3 with the shape [1, 8, 35, 35, 32] (NCHW32c layout) tuned with MetaScheduler on Snapdragon 8gen1: