More link fixes

GregoryComer · GregoryComer · commit ed22ca600056 · 2025-10-20T11:17:15.000-07:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -34,7 +34,7 @@ executorch
 │   ├── <a href="backends/qualcomm">qualcomm</a> - Qualcomm-specific backends. See <a href="docs/source/backends-qualcomm.md">doc</a>.
 │   ├── <a href="backends/transforms">transforms</a> - Transformations for backend optimization.
 │   ├── <a href="backends/vulkan">vulkan</a> - Vulkan backend for cross-platform GPU support. See <a href="docs/source/backends-vulkan.md">doc</a>.
-│   └── <a href="backends/xnnpack">xnnpack</a> - XNNPACK backend for optimized neural network operations. See <a href="docs/source/backends-xnnpack.md">doc</a>.
+│   └── <a href="backends/xnnpack">xnnpack</a> - XNNPACK backend for optimized neural network operations. See <a href="docs/source/backends/xnnpack/xnnpack-overview.md">doc</a>.
 ├── <a href="codegen">codegen</a> - Tooling to autogenerate bindings between kernels and the runtime.
 ├── <a href="configurations">configurations</a> - Configuration files.
 ├── <a href="devtools">devtools</a> - Model profiling, debugging, and inspection. Please refer to the <a href="docs/source/devtools-overview.md">tools documentation</a> for more information.
diff --git a/README-wheel.md b/README-wheel.md
@@ -11,7 +11,7 @@ The `executorch` pip package is in beta.
 The prebuilt `executorch.runtime` module included in this package provides a way
 to run ExecuTorch `.pte` files, with some restrictions:
 * Only [core ATen operators](docs/source/ir-ops-set-definition.md) are linked into the prebuilt module
-* Only the [XNNPACK backend delegate](docs/source/backends-xnnpack.md) is linked into the prebuilt module.
+* Only the [XNNPACK backend delegate](docs/source/backends/xnnpack/xnnpack-overview.md) is linked into the prebuilt module.
 * \[macOS only] [Core ML](docs/source/backends/coreml/coreml-overview.md) and [MPS](docs/source/backends/mps/mps-overview.md) backend
   are also linked into the prebuilt module.
 
diff --git a/docs/source/android-xnnpack.md b/docs/source/android-xnnpack.md
@@ -1 +1 @@
-```{include} /backends/xnnpack/backends-xnnpack.md
+```{include} backends/xnnpack/xnnpack-overview.md
diff --git a/docs/source/backends/template/backend-overview.md b/docs/source/backends/template/backend-overview.md
@@ -51,3 +51,4 @@ backend-op-support
 backend-arch-internals
 tutorials/backend-tutorials
 guides/backend-guides
+```
diff --git a/docs/source/backends/xnnpack/xnnpack-arch-internals.md b/docs/source/backends/xnnpack/xnnpack-arch-internals.md
@@ -6,7 +6,7 @@ This is a high-level overview of the ExecuTorch XNNPACK backend delegate. This h
 XNNPACK is a library of highly-optimized neural network operators for ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, and macOS environments. It is an open source project, you can find more information about it on [github](https://github.com/google/XNNPACK).
 
 ## What are ExecuTorch delegates?
-A delegate is an entry point for backends to process and execute parts of the ExecuTorch program. Delegated portions of ExecuTorch models hand off execution to backends. The XNNPACK backend delegate is one of many available in ExecuTorch. It leverages the XNNPACK third-party library to accelerate ExecuTorch programs efficiently across a variety of CPUs. More detailed information on the delegates and developing your own delegates is available [here](compiler-delegate-and-partitioner.md). It is recommended that you get familiar with that content before continuing on to the Architecture section.
+A delegate is an entry point for backends to process and execute parts of the ExecuTorch program. Delegated portions of ExecuTorch models hand off execution to backends. The XNNPACK backend delegate is one of many available in ExecuTorch. It leverages the XNNPACK third-party library to accelerate ExecuTorch programs efficiently across a variety of CPUs. More detailed information on the delegates and developing your own delegates is available [here](/compiler-delegate-and-partitioner.md). It is recommended that you get familiar with that content before continuing on to the Architecture section.
 
 ## Architecture
 ![High Level XNNPACK delegate Architecture](/backends/xnnpack/xnnpack-delegate-architecture.png) <!-- @lint-ignore linter doesn't like this link for some reason -->
@@ -17,7 +17,7 @@ In the ExecuTorch export flow, lowering to the XNNPACK delegate happens at the `
 ![ExecuTorch XNNPACK delegate Export Flow](/backends/xnnpack/xnnpack-et-flow-diagram.png) <!-- @lint-ignore linter doesn't like this link for some reason -->
 
 #### Partitioner
-The partitioner is implemented by backend delegates to mark nodes suitable for lowering. The `XnnpackPartitioner` lowers using node targets and module metadata. Some more references for partitioners can be found [here](compiler-delegate-and-partitioner.md)
+The partitioner is implemented by backend delegates to mark nodes suitable for lowering. The `XnnpackPartitioner` lowers using node targets and module metadata. Some more references for partitioners can be found [here](/compiler-delegate-and-partitioner.md)
 
 ##### Module-based partitioning
 
@@ -54,7 +54,7 @@ After partitioning the lowerable subgraphs from the model, The XNNPACK delegate
 The XNNPACK delegate uses flatbuffer for serialization. In order to improve runtime performance, the XNNPACK delegate’s flatbuffer [schema](https://github.com/pytorch/executorch/blob/main/backends/xnnpack/serialization/schema.fbs) mirrors the XNNPACK Library’s graph level API calls. The serialized data are arguments to XNNPACK’s APIs, so that at runtime, the XNNPACK execution graph can efficiently be created with successive calls to XNNPACK’s APIs.
 
 ### Runtime
-The XNNPACK backend’s runtime interfaces with the ExecuTorch runtime through the custom `init` and `execute` function. Each delegated subgraph is contained in an individually serialized XNNPACK blob. When the model is initialized, ExecuTorch calls `init` on all XNNPACK Blobs to load the subgraph from serialized flatbuffer. After, when the model is executed, each subgraph is executed via the backend through the custom `execute` function. To read more about how delegate runtimes interface with ExecuTorch, refer to this [resource](compiler-delegate-and-partitioner.md).
+The XNNPACK backend’s runtime interfaces with the ExecuTorch runtime through the custom `init` and `execute` function. Each delegated subgraph is contained in an individually serialized XNNPACK blob. When the model is initialized, ExecuTorch calls `init` on all XNNPACK Blobs to load the subgraph from serialized flatbuffer. After, when the model is executed, each subgraph is executed via the backend through the custom `execute` function. To read more about how delegate runtimes interface with ExecuTorch, refer to this [resource](/compiler-delegate-and-partitioner.md).
 
 
 #### **XNNPACK Library**
@@ -70,7 +70,7 @@ Since weight packing creates an extra copy of the weights inside XNNPACK, We fre
 When executing the XNNPACK subgraphs, we prepare the tensor inputs and outputs and feed them to the XNNPACK runtime graph. After executing the runtime graph, the output pointers are filled with the computed tensors.
 
 #### **Profiling**
-We have enabled basic profiling for the XNNPACK delegate that can be enabled with the compiler flag `-DEXECUTORCH_ENABLE_EVENT_TRACER` (add `-DENABLE_XNNPACK_PROFILING` for additional details). With ExecuTorch's Developer Tools integration, you can also now use the Developer Tools to profile the model. You can follow the steps in [Using the ExecuTorch Developer Tools to Profile a Model](tutorials/devtools-integration-tutorial) <!-- @lint-ignore --> on how to profile ExecuTorch models and use Developer Tools' Inspector API to view XNNPACK's internal profiling information. An example implementation is available in the `executor_runner` (see [tutorial here](tutorial-xnnpack-delegate-lowering.md#profiling)).
+We have enabled basic profiling for the XNNPACK delegate that can be enabled with the compiler flag `-DEXECUTORCH_ENABLE_EVENT_TRACER` (add `-DENABLE_XNNPACK_PROFILING` for additional details). With ExecuTorch's Developer Tools integration, you can also now use the Developer Tools to profile the model. You can follow the steps in [Using the ExecuTorch Developer Tools to Profile a Model](/tutorials/devtools-integration-tutorial) <!-- @lint-ignore --> on how to profile ExecuTorch models and use Developer Tools' Inspector API to view XNNPACK's internal profiling information. An example implementation is available in the `executor_runner` (see [tutorial here](/tutorial-xnnpack-delegate-lowering.md#profiling)).
 
 
 [comment]: <> (TODO: Refactor quantizer to a more official quantization doc)
@@ -142,5 +142,5 @@ def _qdq_quantized_linear(
 You can read more indepth explanations on PyTorch 2 quantization [here](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html).
 
 ## See Also
-- [Integrating XNNPACK Delegate in Android AAR](using-executorch-android.md)
-- [Complete the Lowering to XNNPACK Tutorial](tutorial-xnnpack-delegate-lowering.md)
+- [Integrating XNNPACK Delegate in Android AAR](/using-executorch-android.md)
+- [Complete the Lowering to XNNPACK Tutorial](/tutorial-xnnpack-delegate-lowering.md)
diff --git a/docs/source/backends/xnnpack/xnnpack-overview.md b/docs/source/backends/xnnpack/xnnpack-overview.md
@@ -80,13 +80,13 @@ No additional steps are necessary to use the backend beyond linking the target.
 
 ## Reference
 
-**→{doc}`xnnpack-troubleshooting` — Debug common issues.**
+**→{doc}`/backends/xnnpack/xnnpack-troubleshooting` — Debug common issues.**
 
-**→{doc}`xnnpack-partitioner` — Partitioner options and supported operators.**
+**→{doc}`/backends/xnnpack/xnnpack-partitioner` — Partitioner options and supported operators.**
 
-**→{doc}`xnnpack-quantization` — Supported quantization schemes.**
+**→{doc}`/backends/xnnpack/xnnpack-quantization` — Supported quantization schemes.**
 
-**→{doc}`xnnpack-arch-internals` — XNNPACK backend internals.**
+**→{doc}`/backends/xnnpack/xnnpack-arch-internals` — XNNPACK backend internals.**
 
 ```{toctree}
 :maxdepth: 2
@@ -97,3 +97,4 @@ xnnpack-partitioner
 xnnpack-quantization
 xnnpack-troubleshooting
 xnnpack-arch-internals
+```
diff --git a/docs/source/backends/xnnpack/xnnpack-troubleshooting.md b/docs/source/backends/xnnpack/xnnpack-troubleshooting.md
@@ -10,16 +10,16 @@ The XNNPACK backend is built by default for Python, Android, iOS, and in most CM
 
 * Set the `EXECUTORCH_BUILD_XNNPACK=ON` CMake option option when building from source.
   * Either by passing the option during CMake configuration or setting it inside the user CMake logic before including ExecuTorch.
-  * See [Building from Source](using-executorch-building-from-source).
+  * See [Building from Source](/using-executorch-building-from-source).
 * On iOS, link the `backend_xnnpack` [framework](/using-executorch-ios).
 * If the backend is still not found, link with `WHOLE_ARCHIVE`.
    * Pass `"LINK_LIBRARY:WHOLE_ARCHIVE,xnnpack_backend>"` to `target_link_libraries` in CMake.
 
 ## Slow Performance
 
- * Try reducing the thread count using [_unsafe_reset_threadpool](/using-executorch-faqs#inference-is-slow-performance-troubleshooting).
+ * Try reducing the thread count using [_unsafe_reset_threadpool](/using-executorch-faqs.md#inference-is-slow-performance-troubleshooting).
    * Small models may benefit from using fewer threads than default.
    * Try values between 1 and 4 threads and measure performance on your model.
- * Use [op-level profiling](tutorials/devtools-integration-tutorial) to understand which operators are taking the most time. <!-- @lint-ignore linter doesn't like this link for some reason -->
+ * Use [op-level profiling](/tutorials/devtools-integration-tutorial) to understand which operators are taking the most time. <!-- @lint-ignore linter doesn't like this link for some reason -->
    * The XNNPACK backend provides operator-level timing for delegated operators.
- * See general performance troubleshooting tips in [Performance Troubleshooting](/using-executorch-faqs#inference-is-slow-performance-troubleshooting).
+ * See general performance troubleshooting tips in [Performance Troubleshooting](/using-executorch-faqs.md#inference-is-slow-performance-troubleshooting).
diff --git a/docs/source/desktop-xnnpack.md b/docs/source/desktop-xnnpack.md
@@ -1 +1 @@
-```{include} backends-xnnpack.md
+```{include} backends/xnnpack/xnnpack-overview.md
diff --git a/docs/source/edge-platforms-section.md b/docs/source/edge-platforms-section.md
@@ -65,7 +65,7 @@ After choosing your platform:
 
 ```{toctree}
 :hidden:
-:maxdepth: 2
+:maxdepth: 3
 :caption: Edge Platforms
 
 android-section
diff --git a/docs/source/ios-xnnpack.md b/docs/source/ios-xnnpack.md
@@ -1 +1 @@
-```{include} backends-xnnpack.md
+```{include} backends/xnnpack/xnnpack-overview.md
diff --git a/docs/source/platforms-desktop.md b/docs/source/platforms-desktop.md
@@ -9,15 +9,15 @@ ExecuTorch supports desktop and laptop deployment across Linux, macOS, and Windo
 ## Available Backends by Platform
 
 ### Linux
-- [XNNPACK (CPU)](backends-xnnpack)
+- [XNNPACK (CPU)](backends/xnnpack/xnnpack-overview.md)
 - [OpenVINO (Intel)](build-run-openvino)
 - [ARM Ethos-U (ARM64)](backends-arm-ethos-u)
 
 ### macOS
 - [CoreML (recommended)](backends-coreml)
 - [MPS (Apple Silicon)](backends-mps)
-- [XNNPACK (CPU)](backends-xnnpack)
+- [XNNPACK (CPU)](backends/xnnpack/xnnpack-overview.md)
 
 ### Windows
-- [XNNPACK (CPU)](backends-xnnpack)
+- [XNNPACK (CPU)](backends/xnnpack/xnnpack-overview.md)
 - [OpenVINO (Intel)](build-run-openvino)
diff --git a/docs/source/quantization-overview.md b/docs/source/quantization-overview.md
@@ -28,7 +28,7 @@ These quantizers usually support configs that allow users to specify quantizatio
 
 Not all quantization options are supported by all backends. Consult backend-specific guides for supported quantization modes and configuration, and how to initialize the backend-specific PT2E quantizer:
 
-* [XNNPACK quantization](backends-xnnpack.md#quantization)
+* [XNNPACK quantization](backends/xnnpack/xnnpack-quantization.md)
 * [CoreML quantization](backends/coreml/coreml-quantization.md)
 * [QNN quantization](backends-qualcomm.md#step-2-optional-quantize-your-model)
 
diff --git a/docs/source/tutorial-xnnpack-delegate-lowering.md b/docs/source/tutorial-xnnpack-delegate-lowering.md
@@ -12,7 +12,7 @@ In this tutorial, you will learn how to export an XNNPACK lowered Model and run
 :class-card: card-prerequisites
 * [Setting up ExecuTorch](getting-started-setup.rst)
 * [Model Lowering Tutorial](tutorials/export-to-executorch-tutorial) <!-- @lint-ignore -->
-* [ExecuTorch XNNPACK Delegate](backends-xnnpack.md)
+* [ExecuTorch XNNPACK Delegate](backends/xnnpack/xnnpack-overview.md)
 :::
 ::::
 
diff --git a/docs/source/tutorials_source/bundled_program.bp b/docs/source/tutorials_source/bundled_program.bp
diff --git a/docs/source/using-executorch-android.md b/docs/source/using-executorch-android.md
@@ -70,7 +70,7 @@ Starting from [v1.0.0](https://github.com/pytorch/executorch/releases/tag/v1.0.0
 
 | AAR | SHASUMS | Backend |
 | ------- | --- | ------- |
-| [executorch.aar](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-xnnpack/executorch.aar) | [executorch.aar.sha256sums](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-xnnpack/executorch.aar.sha256sums) | [XNNPACK](backends-xnnpack.md) |
+| [executorch.aar](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-xnnpack/executorch.aar) | [executorch.aar.sha256sums](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-xnnpack/executorch.aar.sha256sums) | [XNNPACK](backends/xnnpack/xnnpack-overview.md) |
 | [executorch.aar](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-qnn/executorch.aar) | [executorch.aar.sha256sums](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-qnn/executorch.aar.sha256sums) | [Qualcomm AI Engine](backends-qualcomm.md) |
 | [executorch.aar](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-vulkan/executorch.aar) | [executorch.aar.sha256sums](https://ossci-android.s3.amazonaws.com/executorch/release/1.0.0-vulkan/executorch.aar.sha256sums) | [Vulkan](backends/vulkan/vulkan-overview.md) |
 
@@ -201,7 +201,8 @@ The following backends are available for Android:
 
 | Backend | Type | Doc |
 | ------- | -------- | --- |
-| [XNNPACK](https://github.com/google/XNNPACK) | CPU | [Doc](backends-xnnpack.md) |
+| [XNNPACK](https://github.com/google/XNNPACK) | CPU | [Doc](backends/xnnpack/xnnpack-overview.md) |
+
 | [MediaTek NeuroPilot](https://neuropilot.mediatek.com/) | NPU | [Doc](backends-mediatek.md) |
 | [Qualcomm AI Engine](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) | NPU | [Doc](backends-qualcomm.md) |
 | [Vulkan](https://www.vulkan.org/) | GPU | [Doc](backends/vulkan/vulkan-overview.md) |
diff --git a/docs/source/using-executorch-export.md b/docs/source/using-executorch-export.md
@@ -32,7 +32,7 @@ As part of the .pte file creation process, ExecuTorch identifies portions of the
 
 Commonly used hardware backends are listed below. For mobile, consider using XNNPACK for Android and XNNPACK or Core ML for iOS. To create a .pte file for a specific backend, pass the appropriate partitioner class to `to_edge_transform_and_lower`. See the appropriate backend documentation and the [Export and Lowering](#export-and-lowering) section below for more information.
 
-- [XNNPACK (CPU)](backends-xnnpack.md)
+- [XNNPACK (CPU)](backends/xnnpack/xnnpack-overview.md)
 - [Core ML (iOS)](backends/coreml/coreml-overview.md)
 - [Metal Performance Shaders (iOS GPU)](backends/mps/mps-overview.md)
 - [Vulkan (Android GPU)](backends-vulkan.md)

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		-```{include} /backends/xnnpack/backends-xnnpack.md
	`1`	+```{include} backends/xnnpack/xnnpack-overview.md
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		-```{include} backends-xnnpack.md
	`1`	+```{include} backends/xnnpack/xnnpack-overview.md