Skip to content

Commit 66e459f

Browse files
committed
Prototype updated XNNPACK doc structure
1 parent 019c8da commit 66e459f

13 files changed

+330
-207
lines changed

docs/source/backend-delegate-advanced.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@
66

77
- {doc}`backend-delegates-integration` — Learn how to integrate a backend delegate into ExecuTorch
88

9-
## XNNPACK Reference
10-
11-
- {doc}`backend-delegates-xnnpack-reference` — Deep dive into XNNPACK delegate internals and implementation details
12-
139
## Dependency Management
1410

1511
- {doc}`backend-delegates-dependencies` — Manage third-party dependencies for backend delegates
@@ -27,7 +23,6 @@
2723
:maxdepth: 1
2824
2925
backend-delegates-integration
30-
backend-delegates-xnnpack-reference
3126
backend-delegates-dependencies
3227
compiler-delegate-and-partitioner
3328
debug-backend-delegate

docs/source/backend-development.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
:maxdepth: 1
55
66
backend-delegates-integration
7-
backend-delegates-xnnpack-reference
87
backend-delegates-dependencies
98
compiler-delegate-and-partitioner
109
debug-backend-delegate

docs/source/backends-overview.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -18,20 +18,20 @@ Backends are the bridge between your exported model and the hardware it runs on.
1818

1919
## Choosing a Backend
2020

21-
| Backend | Platform(s) | Hardware Type | Typical Use Case |
22-
|------------------------------------------|---------------------|---------------|---------------------------------|
23-
| [XNNPACK](backends-xnnpack) | All | CPU | General-purpose, fallback |
24-
| [Core ML](backends-coreml) | iOS, macOS | NPU/GPU | Apple devices, high performance |
25-
| [Metal Performance Shaders](backends-mps)| iOS, macOS | GPU | Apple GPU acceleration |
26-
| [Vulkan ](backends-vulkan) | Android | GPU | Android GPU acceleration |
27-
| [Qualcomm](backends-qualcomm) | Android | NPU | Qualcomm SoCs |
28-
| [MediaTek](backends-mediatek) | Android | NPU | MediaTek SoCs |
29-
| [ARM EthosU](backends-arm-ethos-u) | Embedded | NPU | ARM MCUs |
30-
| [ARM VGF](backends-arm-vgf) | Android | NPU | ARM platforms |
31-
| [OpenVINO](build-run-openvino) | Embedded | CPU/GPU/NPU | Intel SoCs |
32-
| [NXP](backends-nxp) | Embedded | NPU | NXP SoCs |
33-
| [Cadence](backends-cadence) | Embedded | DSP | DSP-optimized workloads |
34-
| [Samsung Exynos](backends-samsung-exynos)| Android | NPU | Samsung SoCs |
21+
| Backend | Platform(s) | Hardware Type | Typical Use Case |
22+
|-----------------------------------------------|---------------------|---------------|---------------------------------|
23+
| [XNNPACK](backends/xnnpack/xnnpack-overview) | All | CPU | General-purpose, fallback |
24+
| [Core ML](backends-coreml) | iOS, macOS | NPU/GPU | Apple devices, high performance |
25+
| [Metal Performance Shaders](backends-mps) | iOS, macOS | GPU | Apple GPU acceleration |
26+
| [Vulkan ](backends-vulkan) | Android | GPU | Android GPU acceleration |
27+
| [Qualcomm](backends-qualcomm) | Android | NPU | Qualcomm SoCs |
28+
| [MediaTek](backends-mediatek) | Android | NPU | MediaTek SoCs |
29+
| [ARM EthosU](backends-arm-ethos-u) | Embedded | NPU | ARM MCUs |
30+
| [ARM VGF](backends-arm-vgf) | Android | NPU | ARM platforms |
31+
| [OpenVINO](build-run-openvino) | Embedded | CPU/GPU/NPU | Intel SoCs |
32+
| [NXP](backends-nxp) | Embedded | NPU | NXP SoCs |
33+
| [Cadence](backends-cadence) | Embedded | DSP | DSP-optimized workloads |
34+
| [Samsung Exynos](backends-samsung-exynos) | Android | NPU | Samsung Socs |
3535

3636
**Tip:** For best performance, export a `.pte` file for each backend you plan to support.
3737

@@ -46,11 +46,11 @@ Backends are the bridge between your exported model and the hardware it runs on.
4646
---
4747

4848
```{toctree}
49-
:maxdepth: 1
49+
:maxdepth: 3
5050
:hidden:
5151
:caption: Backend Overview
5252
53-
backends-xnnpack
53+
backends/xnnpack/xnnpack-overview
5454
backends-coreml
5555
backends-mps
5656
backends-vulkan

docs/source/backends-xnnpack.md

Lines changed: 0 additions & 182 deletions
This file was deleted.

docs/source/backend-delegates-xnnpack-reference.md renamed to docs/source/backends/xnnpack/reference/xnnpack-reference-arch-internals.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# XNNPACK Delegate Internals
1+
# Architecture and Internals
22

33
This is a high-level overview of the ExecuTorch XNNPACK backend delegate. This high performance delegate is aimed to reduce CPU inference latency for ExecuTorch models. We will provide a brief introduction to the XNNPACK library and explore the delegate’s overall architecture and intended use cases.
44

@@ -9,12 +9,12 @@ XNNPACK is a library of highly-optimized neural network operators for ARM, x86,
99
A delegate is an entry point for backends to process and execute parts of the ExecuTorch program. Delegated portions of ExecuTorch models hand off execution to backends. The XNNPACK backend delegate is one of many available in ExecuTorch. It leverages the XNNPACK third-party library to accelerate ExecuTorch programs efficiently across a variety of CPUs. More detailed information on the delegates and developing your own delegates is available [here](compiler-delegate-and-partitioner.md). It is recommended that you get familiar with that content before continuing on to the Architecture section.
1010

1111
## Architecture
12-
![High Level XNNPACK delegate Architecture](xnnpack-delegate-architecture.png)
12+
![High Level XNNPACK delegate Architecture](/backends/xnnpack/xnnpack-delegate-architecture.png)
1313

1414
### Ahead-of-time
1515
In the ExecuTorch export flow, lowering to the XNNPACK delegate happens at the `to_backend()` stage. In this stage, the model is partitioned by the `XnnpackPartitioner`. Partitioned sections of the graph are converted to a XNNPACK specific graph represenationed and then serialized via flatbuffer. The serialized flatbuffer is then ready to be deserialized and executed by the XNNPACK backend at runtime.
1616

17-
![ExecuTorch XNNPACK delegate Export Flow](xnnpack-et-flow-diagram.png)
17+
![ExecuTorch XNNPACK delegate Export Flow](/backends/xnnpack/xnnpack-et-flow-diagram.png)
1818

1919
#### Partitioner
2020
The partitioner is implemented by backend delegates to mark nodes suitable for lowering. The `XnnpackPartitioner` lowers using node targets and module metadata. Some more references for partitioners can be found [here](compiler-delegate-and-partitioner.md)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Partitioner API
2+
3+
The XNNPACK partitioner API allows for configuration of the model delegation to XNNPACK. Passing an `XnnpackPartitioner` instance with no additional parameters will run as much of the model as possible on the XNNPACK backend. This is the most common use-case. For advanced use cases, the partitioner exposes the following options via the [constructor](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/xnnpack_partitioner.py#L31):
4+
5+
- `configs`: Control which operators are delegated to XNNPACK. By default, all available operators all delegated. See [../config/\_\_init\_\_.py](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/config/__init__.py#L66) for an exhaustive list of available operator configs.
6+
- `config_precisions`: Filter operators by data type. By default, delegate all precisions. One or more of `ConfigPrecisionType.FP32`, `ConfigPrecisionType.STATIC_QUANT`, or `ConfigPrecisionType.DYNAMIC_QUANT`. See [ConfigPrecisionType](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/config/xnnpack_config.py#L24).
7+
- `per_op_mode`: If true, emit individual delegate calls for every operator. This is an advanced option intended to reduce memory overhead in some contexts at the cost of a small amount of runtime overhead. Defaults to false.
8+
- `verbose`: If true, print additional information during lowering.

0 commit comments

Comments
 (0)