Skip to content

Commit 9dd2292

Browse files
committed
[Samsung] Docs template
Summary: Title says it all! Add docs for the Samsung backend based on the template introduced in #14873.
1 parent 592f698 commit 9dd2292

File tree

5 files changed

+262
-0
lines changed

5 files changed

+262
-0
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
Operator,Quantization,Constraints
2+
add,static int8,
3+
avg_pool2d,static int8,"ceil_mode=False, divisor_override=pooling_region"
4+
batch_norm,static int8,
5+
bmm,static int8,
6+
cat,static int8,at most 1 constant tensor
7+
clamp,static int8,
8+
constant_pad_nd,static int8,padding_value=0.0 only
9+
conv2d,static int8,constant weights
10+
dequantize_per_channel,,
11+
dequantize_per_tensor,,
12+
div,static int8,
13+
embedding,static int8,
14+
expand_copy,,"expanding at most one axis, new dimensions must be size 1"
15+
gelu,static int8,
16+
getitem,,
17+
hardsigmoid,static int8,
18+
hardswish,static int8,
19+
hardtanh,static int8,
20+
layer_norm,static int8,norm at last axis only
21+
leaky_relu,static int8,
22+
linear,static int8,constant weights
23+
log_softmax,static int8,
24+
max_pool2d,static int8,"ceil_mode=False, indices not supported"
25+
maximum,,
26+
mean_dim,static int8,
27+
minimum,,
28+
mul,static int8,
29+
permute,static int8,
30+
pixel_shuffle,,
31+
quantize_per_channel,,
32+
quantize_per_tensor,,
33+
relu,static int8,
34+
reshape,static int8,
35+
rsqrt,static int8,
36+
select,static int8,
37+
slice_copy,static int8,
38+
softmax,static int8,
39+
sqrt,static int8,
40+
squeeze,static int8,
41+
sub,static int8,
42+
to_copy,,memory_format=contiguous only
43+
unsqueeze,static int8,
44+
upsample_bilinear2d,static int8,
45+
upsample_nearest2d,static int8,
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
================
2+
Operator Support
3+
================
4+
5+
This page lists the PyTorch operators currently supported by the Samsung Exynos backend.
6+
7+
.. csv-table:: Operator Support
8+
:file: samsung-op-support-table.csv
9+
:header-rows: 1
10+
:widths: 25 15 55
11+
:align: center
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Samsung Exynos Backend
2+
3+
ExecuTorch's Samsung Exynos backend enables the execution of ExecuTorch models on
4+
Samsung SoCs via the NPU/DSP. The delegate is built on top of the
5+
[Samsung Exynos AI Litecore SDK]((https://soc-developer.semiconductor.samsung.com/global/development/ai-litecore)).
6+
7+
## Features
8+
9+
- Wide range of operator support
10+
- Supported inference precisions:
11+
- FP16
12+
- 8-bit statically quantized (int8/uint8)
13+
- 16-bit statically quantized (int16/uint16)
14+
15+
## Target Requirements
16+
17+
Currently, the Samsung Exynos backend is supported only for devices with the
18+
following chipsets:
19+
20+
- Exynos 2500 (E9955)
21+
22+
## Development Requirements
23+
24+
The [Samsung Exynos AI Litecore SDK](https://soc-developer.semiconductor.samsung.com/global/development/ai-litecore)
25+
is required to build the Exynos backend from source, and is also required to
26+
export models to the Exynos delegate.
27+
28+
----
29+
30+
## Using the Samsung Exynos Backend
31+
32+
To target the Exynos backend during the export and lowering process, pass an instance of
33+
the `EnnPartitioner` to `to_edge_transform_and_lower`. The example below
34+
demonstrates this process using the MobileNet V2 model from torchvision.
35+
36+
```python
37+
import torch
38+
import torchvision.models as models
39+
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
40+
from executorch.backends.samsung.partition.enn_partitioner import EnnPartitioner
41+
from executorch.backends.samsung.serialization.compile_options import (
42+
gen_samsung_backend_compile_spec,
43+
)
44+
from executorch.exir import to_edge_transform_and_lower
45+
46+
mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
47+
sample_inputs = (torch.randn(1, 3, 224, 224), )
48+
49+
chipset = "E9955"
50+
compile_specs = [gen_samsung_backend_compile_spec(chipset)]
51+
52+
et_program = to_edge_transform_and_lower(
53+
torch.export.export(mobilenet_v2, sample_inputs),
54+
partitioner=[EnnPartitioner(compile_specs)],
55+
).to_executorch()
56+
57+
with open("mv2_xnnpack.pte", "wb") as file:
58+
et_program.write_to_file(file)
59+
```
60+
61+
See [Partitioner API](/backends/samsung/samsung-partitioner) for a reference on available partitioner options.
62+
63+
----
64+
65+
## Quantization
66+
67+
The Samsung Exynos backend support statically quantized models with 8-bit and 16-bit
68+
integral types.
69+
70+
See [Samsung Exynos Quantization](/backends/samsung/samsung-quantization) for more
71+
information on available quantization schemes and APIs.
72+
73+
----
74+
75+
## Runtime Integration
76+
77+
To run the model on-device, use the standard ExecuTorch runtime APIs.
78+
79+
The Exynos backend is currently not available in any of ExecuTorch's published packages.
80+
To access it, build ExecuTorch from source. When building from source, pass
81+
`-DEXECUTORCH_BUILD_EXYNOS=ON` when configuring the CMake build. See [Running on Device](/getting-started.md#running-on-device)
82+
for more information.
83+
84+
Then, to link against the backend, add the `executorch_backends` CMake target as a build
85+
dependency.
86+
87+
```
88+
# CMakeLists.txt
89+
add_subdirectory("executorch")
90+
...
91+
target_link_libraries(
92+
my_target
93+
PRIVATE executorch
94+
executorch_backends
95+
...
96+
)
97+
```
98+
99+
No additional steps are necessary to use the backend beyond linking the target. Any
100+
Exynos delegated .pte file will automatically run on the registered backend.
101+
102+
## Reference
103+
104+
**→{doc}`exynos-partitioner` — Partitioner options.**
105+
106+
**→{doc}`exynos-quantization` — Supported quantization schemes.**
107+
108+
**→{doc}`exynos-op-support` — Supported operators.**
109+
110+
```{toctree}
111+
:maxdepth: 2
112+
:hidden:
113+
:caption: Exynos Backend
114+
115+
exynos-partitioner
116+
exynos-quantization
117+
exynos-op-support
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Partitioner API
2+
3+
The `EnnPartitioner` API is the primary entrypoint when exporting a model to the Samsung
4+
Exynos backend. The partitioner is responsible for determining which parts of the model
5+
should be lowered to the backend and also provides an interface for configuring the
6+
behaviour of the backend.
7+
8+
Currently, the configuration options for `EnnPartitioner` can be generated automatically
9+
using the `gen_samsung_backend_compile_spec` API. For instance,
10+
11+
```python
12+
from executorch.backends.samsung.partition.enn_partitioner import EnnPartitioner
13+
from executorch.backends.samsung.serialization.compile_options import (
14+
gen_samsung_backend_compile_spec,
15+
)
16+
17+
from executorch.exir import to_edge_transform_and_lower
18+
19+
chipset = "E9955"
20+
compile_specs = [gen_samsung_backend_compile_spec(chipset)]
21+
22+
et_program = to_edge_transform_and_lower(
23+
exported_program,
24+
partitioner=[EnnPartitioner(compile_specs)],
25+
).to_executorch()
26+
```
27+
28+
At the moment, only `"E9955"` is supported as a valid chipset name, which corresponds to
29+
the Exynose 2500 SoC. Support for additional chipsets will be added in the future.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Quantization
2+
3+
The Exynos backend currently supports executing statically quantized 8-bit models.
4+
5+
### 8-bit quantization with the PT2E quantization flow
6+
7+
To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model:
8+
9+
1) Create an instance of the `EnnQuantizer` class and set the desired quantization behaviour.
10+
2) Use `torch.export.export` to obtain a graph module representation of the source model.
11+
3) Use `prepare_pt2e` to prepare the model for quantization.
12+
4) Execute the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
13+
5) Use `convert_pt2e` to quantize the model.
14+
6) Export and lower the model using the standard export flow.
15+
16+
The output of `convert_pt2e` is a PyTorch model which can be exported and lowered using
17+
the same export flow as non-quantized models. As it is a regular PyTorch model, it can
18+
also be used to evaluate the accuracy of the quantized model using standard PyTorch
19+
techniques.
20+
21+
The below example shows how to quantize a MobileNetV2 model using the PT2E quantization flow.
22+
23+
```python
24+
import torch
25+
import torchvision.models as models
26+
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
27+
28+
from executorch.backends.samsung.partition.enn_partitioner import EnnPartitioner
29+
from executorch.backends.samsung.quantizer.quantizer import EnnQuantizer, Precision
30+
31+
from executorch.exir import to_edge_transform_and_lower
32+
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
33+
34+
model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
35+
sample_inputs = (torch.randn(1, 3, 224, 224), )
36+
37+
# Currently, "A8W8" is the only supported precision mode
38+
precision = "A8W8"
39+
is_per_channel = True
40+
is_qat = False
41+
42+
quantizer = EnnQuantizer()
43+
quantizer.set_quant_params(precision, is_per_channel, is_qat) # (1)
44+
45+
training_ep = torch.export.export(model, sample_inputs).module() # (2)
46+
prepared_model = prepare_pt2e(training_ep, quantizer) # (3)
47+
48+
for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs
49+
prepared_model(cal_sample) # (4) Calibrate
50+
51+
quantized_model = convert_pt2e(prepared_model) # (5)
52+
53+
et_program = to_edge_transform_and_lower( # (6)
54+
torch.export.export(quantized_model, sample_inputs),
55+
partitioner=[EnnPartitioner()],
56+
).to_executorch()
57+
```
58+
59+
See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html)
60+
for more information.

0 commit comments

Comments
 (0)