You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
├── <ahref="devtools">devtools</a> - Model profiling, debugging, and inspection. Please refer to the <ahref="docs/source/devtools-overview.md">tools documentation</a> for more information.
Copy file name to clipboardExpand all lines: docs/source/backends/xnnpack/xnnpack-arch-internals.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This is a high-level overview of the ExecuTorch XNNPACK backend delegate. This h
6
6
XNNPACK is a library of highly-optimized neural network operators for ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, and macOS environments. It is an open source project, you can find more information about it on [github](https://github.com/google/XNNPACK).
7
7
8
8
## What are ExecuTorch delegates?
9
-
A delegate is an entry point for backends to process and execute parts of the ExecuTorch program. Delegated portions of ExecuTorch models hand off execution to backends. The XNNPACK backend delegate is one of many available in ExecuTorch. It leverages the XNNPACK third-party library to accelerate ExecuTorch programs efficiently across a variety of CPUs. More detailed information on the delegates and developing your own delegates is available [here](compiler-delegate-and-partitioner.md). It is recommended that you get familiar with that content before continuing on to the Architecture section.
9
+
A delegate is an entry point for backends to process and execute parts of the ExecuTorch program. Delegated portions of ExecuTorch models hand off execution to backends. The XNNPACK backend delegate is one of many available in ExecuTorch. It leverages the XNNPACK third-party library to accelerate ExecuTorch programs efficiently across a variety of CPUs. More detailed information on the delegates and developing your own delegates is available [here](/compiler-delegate-and-partitioner.md). It is recommended that you get familiar with that content before continuing on to the Architecture section.
10
10
11
11
## Architecture
12
12
<!-- @lint-ignore linter doesn't like this link for some reason -->
@@ -17,7 +17,7 @@ In the ExecuTorch export flow, lowering to the XNNPACK delegate happens at the `
17
17
<!-- @lint-ignore linter doesn't like this link for some reason -->
18
18
19
19
#### Partitioner
20
-
The partitioner is implemented by backend delegates to mark nodes suitable for lowering. The `XnnpackPartitioner` lowers using node targets and module metadata. Some more references for partitioners can be found [here](compiler-delegate-and-partitioner.md)
20
+
The partitioner is implemented by backend delegates to mark nodes suitable for lowering. The `XnnpackPartitioner` lowers using node targets and module metadata. Some more references for partitioners can be found [here](/compiler-delegate-and-partitioner.md)
21
21
22
22
##### Module-based partitioning
23
23
@@ -54,7 +54,7 @@ After partitioning the lowerable subgraphs from the model, The XNNPACK delegate
54
54
The XNNPACK delegate uses flatbuffer for serialization. In order to improve runtime performance, the XNNPACK delegate’s flatbuffer [schema](https://github.com/pytorch/executorch/blob/main/backends/xnnpack/serialization/schema.fbs) mirrors the XNNPACK Library’s graph level API calls. The serialized data are arguments to XNNPACK’s APIs, so that at runtime, the XNNPACK execution graph can efficiently be created with successive calls to XNNPACK’s APIs.
55
55
56
56
### Runtime
57
-
The XNNPACK backend’s runtime interfaces with the ExecuTorch runtime through the custom `init` and `execute` function. Each delegated subgraph is contained in an individually serialized XNNPACK blob. When the model is initialized, ExecuTorch calls `init` on all XNNPACK Blobs to load the subgraph from serialized flatbuffer. After, when the model is executed, each subgraph is executed via the backend through the custom `execute` function. To read more about how delegate runtimes interface with ExecuTorch, refer to this [resource](compiler-delegate-and-partitioner.md).
57
+
The XNNPACK backend’s runtime interfaces with the ExecuTorch runtime through the custom `init` and `execute` function. Each delegated subgraph is contained in an individually serialized XNNPACK blob. When the model is initialized, ExecuTorch calls `init` on all XNNPACK Blobs to load the subgraph from serialized flatbuffer. After, when the model is executed, each subgraph is executed via the backend through the custom `execute` function. To read more about how delegate runtimes interface with ExecuTorch, refer to this [resource](/compiler-delegate-and-partitioner.md).
58
58
59
59
60
60
#### **XNNPACK Library**
@@ -70,7 +70,7 @@ Since weight packing creates an extra copy of the weights inside XNNPACK, We fre
70
70
When executing the XNNPACK subgraphs, we prepare the tensor inputs and outputs and feed them to the XNNPACK runtime graph. After executing the runtime graph, the output pointers are filled with the computed tensors.
71
71
72
72
#### **Profiling**
73
-
We have enabled basic profiling for the XNNPACK delegate that can be enabled with the compiler flag `-DEXECUTORCH_ENABLE_EVENT_TRACER` (add `-DENABLE_XNNPACK_PROFILING` for additional details). With ExecuTorch's Developer Tools integration, you can also now use the Developer Tools to profile the model. You can follow the steps in [Using the ExecuTorch Developer Tools to Profile a Model](tutorials/devtools-integration-tutorial)<!-- @lint-ignore --> on how to profile ExecuTorch models and use Developer Tools' Inspector API to view XNNPACK's internal profiling information. An example implementation is available in the `executor_runner` (see [tutorial here](tutorial-xnnpack-delegate-lowering.md#profiling)).
73
+
We have enabled basic profiling for the XNNPACK delegate that can be enabled with the compiler flag `-DEXECUTORCH_ENABLE_EVENT_TRACER` (add `-DENABLE_XNNPACK_PROFILING` for additional details). With ExecuTorch's Developer Tools integration, you can also now use the Developer Tools to profile the model. You can follow the steps in [Using the ExecuTorch Developer Tools to Profile a Model](/tutorials/devtools-integration-tutorial)<!-- @lint-ignore --> on how to profile ExecuTorch models and use Developer Tools' Inspector API to view XNNPACK's internal profiling information. An example implementation is available in the `executor_runner` (see [tutorial here](/tutorial-xnnpack-delegate-lowering.md#profiling)).
74
74
75
75
76
76
[comment]: <>(TODO: Refactor quantizer to a more official quantization doc)
@@ -142,5 +142,5 @@ def _qdq_quantized_linear(
142
142
You can read more indepth explanations on PyTorch 2 quantization [here](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html).
143
143
144
144
## See Also
145
-
-[Integrating XNNPACK Delegate in Android AAR](using-executorch-android.md)
146
-
-[Complete the Lowering to XNNPACK Tutorial](tutorial-xnnpack-delegate-lowering.md)
145
+
-[Integrating XNNPACK Delegate in Android AAR](/using-executorch-android.md)
146
+
-[Complete the Lowering to XNNPACK Tutorial](/tutorial-xnnpack-delegate-lowering.md)
Copy file name to clipboardExpand all lines: docs/source/backends/xnnpack/xnnpack-troubleshooting.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,16 +10,16 @@ The XNNPACK backend is built by default for Python, Android, iOS, and in most CM
10
10
11
11
* Set the `EXECUTORCH_BUILD_XNNPACK=ON` CMake option option when building from source.
12
12
* Either by passing the option during CMake configuration or setting it inside the user CMake logic before including ExecuTorch.
13
-
* See [Building from Source](using-executorch-building-from-source).
13
+
* See [Building from Source](/using-executorch-building-from-source).
14
14
* On iOS, link the `backend_xnnpack`[framework](/using-executorch-ios).
15
15
* If the backend is still not found, link with `WHOLE_ARCHIVE`.
16
16
* Pass `"LINK_LIBRARY:WHOLE_ARCHIVE,xnnpack_backend>"` to `target_link_libraries` in CMake.
17
17
18
18
## Slow Performance
19
19
20
-
* Try reducing the thread count using [_unsafe_reset_threadpool](/using-executorch-faqs#inference-is-slow-performance-troubleshooting).
20
+
* Try reducing the thread count using [_unsafe_reset_threadpool](/using-executorch-faqs.md#inference-is-slow-performance-troubleshooting).
21
21
* Small models may benefit from using fewer threads than default.
22
22
* Try values between 1 and 4 threads and measure performance on your model.
23
-
* Use [op-level profiling](tutorials/devtools-integration-tutorial) to understand which operators are taking the most time. <!-- @lint-ignore linter doesn't like this link for some reason -->
23
+
* Use [op-level profiling](/tutorials/devtools-integration-tutorial) to understand which operators are taking the most time. <!-- @lint-ignore linter doesn't like this link for some reason -->
24
24
* The XNNPACK backend provides operator-level timing for delegated operators.
25
-
* See general performance troubleshooting tips in [Performance Troubleshooting](/using-executorch-faqs#inference-is-slow-performance-troubleshooting).
25
+
* See general performance troubleshooting tips in [Performance Troubleshooting](/using-executorch-faqs.md#inference-is-slow-performance-troubleshooting).
0 commit comments