Skip to content

Commit 4c607b7

Browse files
kotatsuyakimengshyu
authored andcommitted
[RFC] NNAPI Integration via BYOC
This RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC Co-authored-by: Ming-Long Huang [email protected] Co-authored-by: HMZ [email protected]
1 parent f0f982f commit 4c607b7

File tree

1 file changed

+81
-0
lines changed

1 file changed

+81
-0
lines changed

rfcs/0109-byoc-nnapi.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
- Feature Name: byoc_nnapi
2+
- Start Date: 2024-08-01
3+
- RFC PR: [apache/tvm-rfcs#0109](https://github.com/apache/tvm-rfcs/pull/0109)
4+
- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC.
10+
11+
# Motivation
12+
[motivation]: #motivation
13+
14+
Android Neural Networks API (NNAPI) is a graph-level neural network inference API provided by the Android runtime. Prior to this RFC, TVM on Android mobile devices mainly relies on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.
15+
16+
# Guide-level explanation
17+
[guide-level-explanation]: #guide-level-explanation
18+
19+
**How to use the NNAPI BYOC backend?**
20+
21+
Use the `partition_for_nnapi()` function to partition operations that are supported by NNAPI from an `IRModule`. The optional `feature_level` keyword argument specifies the highest NNAPI feature level. Operations introduced in feature levels higher than the specified level do not get partitioned.
22+
23+
```python
24+
from tvm.relax.op.contrib.nnapi import partition_for_nnapi
25+
26+
mod = partition_for_nnapi(mod, feature_level=7)
27+
```
28+
29+
Build the module after partitioning. The result of the build can then be exported and deployed to an Android device built with the NNAPI runtime support turned on.
30+
31+
```python
32+
android_target = "llvm -mtriple=aarch64-linux-android"
33+
lib = relax.build(mod, target=android_target)
34+
```
35+
36+
# Reference-level explanation
37+
[reference-level-explanation]: #reference-level-explanation
38+
39+
This RFC adds optional support for NNAPI via BYOC without affecting other features in TVM.
40+
41+
**Added code**:
42+
43+
We have an implementation with the following components added to the TVM codebase.
44+
45+
- NNAPI partition function implemented with pattern matching.
46+
- NNAPI codegen that serializes Relay IR subgraphs to JSON runtime modules.
47+
- NNAPI runtime that loads JSON runtime modules and calls API functions to perform model build, compile, and inference.
48+
49+
**Supported ops**:
50+
51+
The implementation supports the following ops in both `float32` and `float16` data types.
52+
53+
- Element-wise unary operations (relu, exp, …)
54+
- Element-wise binary operations (add, multiply, …)
55+
- nn.dense
56+
- nn.conv2d
57+
- nn.max_pool2d
58+
59+
# Drawbacks
60+
[drawbacks]: #drawbacks
61+
62+
In the current implementation, the performance gain of NNAPI is not consistent on the mobile devices due to SoC drivers being unable to accelerate all of the supported operations. This may be mitigated by further integrating a smarter partitioning algorithm that selectively offloads operations based on profiling as seen in the [Prior art](#prior-art) section.
63+
64+
# Rationale and alternatives
65+
[rationale-and-alternatives]: #rationale-and-alternatives
66+
67+
Instead of using JSON codegen, the integration can also be implemented using C source codegen. See the [Prior art](#prior-art) section.
68+
69+
# Prior art
70+
[prior-art]: #prior-art
71+
72+
This RFC is a successor of [an RFC by us](https://discuss.tvm.apache.org/t/rfc-byoc-android-nnapi-integration/9072) in 2021. The codegen and the runtime has been rewritten from scratch since then to generate and load standardized `JSONRuntimeBased` modules instead of C source code.
73+
74+
# Unresolved questions
75+
[unresolved-questions]: #unresolved-questions
76+
77+
# Future possibilities
78+
[future-possibilities]: #future-possibilities
79+
80+
- Add support for quantized data types to cover Relay QNN dialect or Relax quantize/dequantize operators.
81+
- Add support for dynamic shape operands.

0 commit comments

Comments
 (0)