-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[microNPU][ETHOSU] MatMul legalization support #15780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NPU has a restriction that weights must be constant so the matrix multiplication operation was expressed using split, elementwise multiplication, reduce sum, concatenations operations
|
@tvm-bot rerun |
lhutton1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Aleksei-grovety it's an interesting idea. I'm curious how the performance differs compared to the CPU, is there any point at which falling back to the CPU becomes a more attractive option as the size of the matmul increases?
|
For a single matrix multiplication operation, the performance decreases, for case 1x16@16x8 on high-Performance (HP) subsystem on NPU we get 39.793 μs, on CPU 31.163 μs, it seems that with increasing size there is a linear dependence, for case 1x256@256x32 on NPU we get 406.993 μs, on CPU 323.863 μs. But in this situation, when there is only a matmul operation, the problem is that reshape and concatenate operations are offloaded on CPU. If we change model input to 1d array and use split and reshape operations (in this case all operations will be offloaded to NPU) for case 1x16@16x8on NPU we get 34.128 μs (it's still worse than on the CPU) but for case 1x256@256x32 on NPU we get 111.208 μs (almost three times better than on the CPU). I think it's worth adding an option to enable the offloading of the matmul operation on the NPU. |
ekalda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Aleksei-grovety I think this is really good improvement to the microNPU support! I suppose in the case where there is only matmul on the model the reshape and concatenate are not offloaded because there is no consumer to inline them into? I'd expect this is a special case problem since most of real networks have other operators after the matmul.
This is due to 2 inputs and the preprocess_ext_io pass. Before it, all operations are offloaded to the NPU, and after that the reshape and concatenate operations are added. And these operations are not added to the composites because MergeComposite is called before that. |
ekalda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Aleksei-grovety, LGTM!
NPU has a restriction that weights must be constant, so the matrix multiplication operation was expressed using split, elementwise multiplication, reduce sum, concatenations operations.
cc @lhutton1, @ekalda, @leandron