-
Notifications
You must be signed in to change notification settings - Fork 79
[RFC] Use CMSIS-NN with TVM #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
6f39abb
Markdown for CMSIS-NN integration
ca14511
Title changed to use of CMSIS-NN with TVM
1948c0a
Added acronyms and fixed few spellings
a10108e
Changed name of the markdown to match PR number
03ce0e5
Cody's comments about python APIs and config.cmake
a53d451
Andrew's comments: more details about CMSIS-NN ops and fixed mistakes…
6dcdcb1
Andrew's comments II: restructuring testing, guide level explanations
6a3517b
Upstreaming plan misses line separator
203cf32
Upstreaming plan misses line separator
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| - Feature Name: [RFC] Use CMSIS-NN with TVM | ||
| - Start Date: July 2021 | ||
| - RFC PR: https://github.com/apache/tvm-rfcs/pull/15 | ||
| - GitHub Issue: https://github.com/apache/tvm/issues/8646 | ||
|
|
||
| # Acronyms | ||
| CMSIS: Common Microcontroller Software Interface Standard | ||
| ACL: The Compute Library for the Arm® Architecture | ||
| MLF: Model Library Format | ||
|
|
||
| # Summary | ||
|
|
||
| This RFC introduces plan of integration of CMSIS-NN library into TVM. It consists of efficient kernels targeted for Arm's Cortex-M architecture. | ||
|
|
||
| Please refer to the following pages for more details on CMSIS-NN. | ||
| https://arm-software.github.io/CMSIS_5/NN/html/index.html | ||
| https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN | ||
|
|
||
| First PR in the series of PRs to fulfill this integration would be graph partitioner for softmax int8. Detailed plan can found below in this RFC. | ||
|
|
||
|
|
||
| # Motivation | ||
|
|
||
| CMSIS-NN library consists of hand-tuned kernels that are suitable for Cortex-M and are compliant with the quantization scheme used in Tensorflow Lite. They have been optimized for better performance and small memory footprint which is required on these embedded devices and it would make sense for TVM to reuse these while generating code for Cortex-M. They have been integrated with the TensorFlow Lite Micro project. | ||
|
|
||
|
|
||
| # Guide-level explanation | ||
asparkhi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| TVM's external code generation infrastructure allows for the automatic partitioning and code generation using the external compiler. Partitioned subgraphs containing operator(s) targeted for Cortex-M can then be translated into the CMSIS-NN C APIs which eventually become part of MLF. For this integration, we are heavily dependent on the TVM's infrastructure for external code generation. | ||
asparkhi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If a user runs tvmc, they will get a MLF format archive which calls out to the CMSIS operators. | ||
|
|
||
| ``` | ||
| tvmc --target=cmsisnn,c --output-format=mlf --executor=aot | ||
asparkhi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
|
|
||
| # Reference-level explanation | ||
|
|
||
| We will enable this integration by considering TFLite networks, but is equally applicable for all other networks that can be translated into Relay IR. TFLite test that contains just a quantized (int8) softmax is first converted as a sequence of following relay operations: *dequantize -> softmax -> quantize* by the TFLite frontend. Please refer to the code snippet below. | ||
|
|
||
| ```python | ||
| def @main(%a: Tensor[(1, 16, 16, 3), int8]) -> Tensor[(1, 16, 16, 3), int8] { | ||
| %0 = qnn.dequantize(%a, 0.02f /* ty=float32 */, 64 /* ty=int32 */) /* ty=Tensor[(1, 16, 16, 3), float32] */; | ||
| %1 = nn.softmax(%0) /* ty=Tensor[(1, 16, 16, 3), float32] */; | ||
| qnn.quantize(%1, 0.02f /* ty=float32 */, 64 /* ty=int32 */, out_dtype="int8") /* ty=Tensor[(1, 16, 16, 3), int8] */ | ||
| } | ||
| ``` | ||
|
|
||
| Following code block shows result of the graph partitioning for cmsisnn target. | ||
|
|
||
| ```python | ||
| def @main(%a: Tensor[(1, 16, 16, 3), int8]) -> Tensor[(1, 16, 16, 3), int8] { | ||
| @tvmgen_default_cmsisnn_0(%a) /* ty=Tensor[(1, 16, 16, 3), int8] */ | ||
| } | ||
|
|
||
| def @tvmgen_default_cmsisnn_0(%cmsisnn_0_i0: Tensor[(1, 16, 16, 3), int8], Inline=1, Compiler="cmsisnn", global_symbol="tvmgen_default_cmsisnn_0", Primitive=1) -> Tensor[(1, 16, 16, 3), int8] { | ||
| %2 = fn (%FunctionVar_0_0: Tensor[(1, 16, 16, 3), int8], PartitionedFromPattern="qnn.dequantize_nn.softmax_qnn.quantize_", Composite="cmsisnn.qnn_softmax") -> Tensor[(1, 16, 16, 3), int8] { | ||
| %0 = qnn.dequantize(%FunctionVar_0_0, 0.02f /* ty=float32 */, 64 /* ty=int32 */) /* ty=Tensor[(1, 16, 16, 3), float32] */; | ||
| %1 = nn.softmax(%0) /* ty=Tensor[(1, 16, 16, 3), float32] */; | ||
| qnn.quantize(%1, 0.02f /* ty=float32 */, 64 /* ty=int32 */, out_dtype="int8") /* ty=Tensor[(1, 16, 16, 3), int8] */ | ||
| }; | ||
| %2(%cmsisnn_0_i0) /* ty=Tensor[(1, 16, 16, 3), int8] */ | ||
| } | ||
| ``` | ||
|
|
||
| Target hooks for `relay_to_tir` implemented as part of https://github.com/apache/tvm-rfcs/pull/10 is used to obtain the following tir for graph with softmax. These hooks provide us with the flexibility to reuse memory planning and much of the TVM's code generation capabilities. | ||
|
|
||
| ```python | ||
| primfn(placeholder_1: handle, out_write_1: handle) -> () | ||
| attr = {"global_symbol": "main", "tir.noalias": True} | ||
| buffers = {placeholder: Buffer(placeholder_1: Pointer(int8), int8, [1, 300, 300, 3], []), | ||
| out_write: Buffer(out_write_1: Pointer(int8), int8, [1, 300, 300, 3], [])} | ||
| buffer_map = {placeholder_1: placeholder_1, out_write_1: out_write_1} { | ||
| ... | ||
| allocate(placeholder.d.global, uint8, [1,300,300,3]) { | ||
| @tir.call_extern("cmsisnn_softmax_s8", ..., dtype=handle) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| At last, code generator identifies the extern_call and generates code for softmax with the CMSIS-NN API for softmax int8. | ||
|
|
||
| For more complex operations, CMSIS-NN structures will need to be used. For this purpose, `tir_to_runtime` will be used to extend the existing C Codegen and produce C code with the appropriate headers and calling patterns. Please refer to the [Additional Target Hooks RFC] (https://github.com/apache/tvm-rfcs/pull/10). | ||
asparkhi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Testing | ||
|
|
||
| As we introduce the operators, we will keep on adding individual unit tests. Once the operator support is partially completed, we will start adding network tests. We are planning to use [Arm® Corestone™-300 Fixed Virtual Platform] (https://developer.arm.com/ip-products/subsystem/corstone/corstone-300) to run these tests in the CI. Reference: [Arm Ethos-U Integration] (https://github.com/apache/tvm-rfcs/pull/11/files) | ||
asparkhi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Drawbacks | ||
|
|
||
| CMSIS-NN APIs provide hand coded kernels. Therefore, code generation skips the auto tuning capabilities of TVM. In future, we wish to make use of full power of TVM's auto scheduling. | ||
asparkhi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Upstreaming Plan | ||
|
|
||
| Before adding other operators from CMSIS-NN, the integration will be enabled only for softmax. | ||
|
|
||
| P1: Graph partitioner for CMSIS-NN target | ||
asparkhi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| P2: Code generation using existing BYOC | ||
| P3: tvmc support to generate code for CMSIS-NN | ||
| P4: Move this implementation using `tir_to_runtime` from target hooks | ||
| P5: Use of CMSIS-NN data structures while supporting depthwise convolution | ||
| P6: Support for Convolution | ||
| P7: Support for Fully connected | ||
| P8: Support for Max Pooling | ||
| P9: Support for Avg Pooling | ||
| P10: Support for MatMul | ||
|
|
||
|
|
||
| # Prior art | ||
|
|
||
| CMSIS-NN integration into TVM builds on top of ACL's integration into TVM. Existing infrastructure of BYOC allows for graph partitioning to detach the operators or chain of operations as a separate subgraph that then can be compiled for Cortex-M. | ||
|
|
||
| Reference: [ACL] (https://tvm.apache.org/docs/deploy/arm_compute_lib.html) | ||
|
|
||
| Code generation for CMSIS-NN will use the newly introduced target hooks. | ||
|
|
||
| Reference: [Additional Target Hooks] (https://github.com/apache/tvm-rfcs/pull/10/files) | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.