-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[microNPU] Tweak a layout transform matrix #10763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
23685d9 to
33525e6
Compare
lhutton1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @ekalda! Just some small suggestions
| [0, 1, 0, 0, 0, 0], | ||
| [0, 0, 0, 1, 0, 0], | ||
| [0, 0, 16, 0, 1, -16], | ||
| [0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Any reason for not making this change in binary_elementwise and unary_elementwise? Additionally it looks like the nhcwb16_to_nhwc and nhwc_to_nhcwb16 matrices are the same across the operators, should we move it to a common place like util to reduce duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I didn't make that change since the old transformation matrix wasn't "harmful" in the case of these operators (it was only a problem for operators that had weights or some kind of a kernel), but I think it makes sense to unify the transform for all the NPU ops since then we can indeed easily define the layout transform matrices in one place and easily reuse them across all the TEs and also in the tests.
| [0, 1, 0, 0, 0, 0], | ||
| [0, 0, 0, 1, 0, 0], | ||
| [0, 0, 16, 0, 1, -16], | ||
| [0, 0, 0, 0, 0, int(ofm_channels)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is int(...) necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since the type of ofm_channels is IntImm, so without the cast it would end up causing an error in propagator.py
33525e6 to
b0d1ae6
Compare
|
Thanks for the reviews @lhutton1 and @NicolaLancellotti! I've addresed the comments... |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ekalda ! (and other reviewers) .
This looks much cleaner and I've added some suggestions to improve the docs around the matrices
(so it is clear why those numbers looks like that)
| from typing import Tuple, List | ||
|
|
||
|
|
||
| def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : Lets link to this page : https://developer.arm.com/documentation/102420/0200/Functional-description/Control-and-data-flow/Supported-memory-formats-for-feature-maps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| [0, 1, 0, 0, 0], | ||
| [0, 0, 0, 1 / 16, 0], | ||
| [0, 0, 1, 0, 0], | ||
| [0, 0, 0, 0, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : might worth putting a comment to indicate that b16 axis is always going to be of fixed size of 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| [1, 0, 0, 0, 0, 0], | ||
| [0, 1, 0, 0, 0, 0], | ||
| [0, 0, 0, 1, 0, 0], | ||
| [0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : lets state that the conversion of nhwc_to_nhcwb16 is lossy (because b16 axis is fixed to 16). Therefore, to recover the original "c" of "nhwc", we need to use the original channels in the transform matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b0d1ae6 to
23ecc50
Compare
ekalda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @manupa-arm! I added documentation for the transform ops, does it make sense?
| from typing import Tuple, List | ||
|
|
||
|
|
||
| def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| [0, 1, 0, 0, 0], | ||
| [0, 0, 0, 1 / 16, 0], | ||
| [0, 0, 1, 0, 0], | ||
| [0, 0, 0, 0, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| [1, 0, 0, 0, 0, 0], | ||
| [0, 1, 0, 0, 0, 0], | ||
| [0, 0, 0, 1, 0, 0], | ||
| [0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
23ecc50 to
d069c27
Compare
One of the layout transforms currently causes the cascader to stripe across B16 axis (which is not allowed), so change that and deal with the implications to the get_valid_block_configs. Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2
* Change the nhcwb16_to_nhwc matrix for binary and unary elementwise such that it matches the other NPU ops * Reduce the number of places where the same layout transform matrices are defined
d069c27 to
df32c58
Compare
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Thanks @ekalda @NicolaLancellotti @lhutton1 |
* [microNPU] Fix layout transform matrix One of the layout transforms currently causes the cascader to stripe across B16 axis (which is not allowed), so change that and deal with the implications to the get_valid_block_configs. Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2 * Reduce the duplication of layout transfrom matrices * Change the nhcwb16_to_nhwc matrix for binary and unary elementwise such that it matches the other NPU ops * Reduce the number of places where the same layout transform matrices are defined * Add documentation to the layout transform matrices
One of the layout transform matrices currently causes the cascader to stripe
across B16 axis (which is not allowed), so change that and deal with
the implications to the get_valid_block_configs.