|
1 | | -# Vulkan Backend |
| 1 | +# The ExecuTorch Vulkan Backend |
2 | 2 |
|
3 | | -The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is |
4 | | -built on top of the cross-platform Vulkan GPU API standard. It is primarily |
5 | | -designed to leverage the GPU to accelerate model inference on Android devices, |
6 | | -but can be used on any platform that supports an implementation of Vulkan: |
7 | | -laptops, servers, and edge devices. |
8 | | - |
9 | | -::::{note} |
10 | | -The Vulkan delegate is currently under active development, and its components |
11 | | -are subject to change. |
12 | | -:::: |
13 | | - |
14 | | -## What is Vulkan? |
15 | | - |
16 | | -Vulkan is a low-level GPU API specification developed as a successor to OpenGL. |
17 | | -It is designed to offer developers more explicit control over GPUs compared to |
18 | | -previous specifications in order to reduce overhead and maximize the |
19 | | -capabilities of the modern graphics hardware. |
20 | | - |
21 | | -Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both |
22 | | -desktop and mobile) in the market support Vulkan. Vulkan is also included in |
23 | | -Android from Android 7.0 onwards. |
24 | | - |
25 | | -**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it |
26 | | -provides a way to execute compute and graphics operations on a GPU, but does not |
27 | | -come with a built-in library of performant compute kernels. |
28 | | - |
29 | | -## The Vulkan Compute Library |
30 | | - |
31 | | -The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as |
32 | | -the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to |
33 | | -provide GPU implementations for PyTorch operators via GLSL compute shaders. |
34 | | - |
35 | | -The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html). |
36 | | -The core components of the PyTorch Vulkan backend were forked into ExecuTorch |
37 | | -and adapted for an AOT graph-mode style of model inference (as opposed to |
38 | | -PyTorch which adopted an eager execution style of model inference). |
39 | | - |
40 | | -The components of the Vulkan Compute Library are contained in the |
41 | | -`executorch/backends/vulkan/runtime/` directory. The core components are listed |
42 | | -and described below: |
43 | | - |
44 | | -``` |
45 | | -runtime/ |
46 | | -├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects |
47 | | -└── graph/ .................. ComputeGraph class which implements graph mode inference |
48 | | - └── ops/ ................ Base directory for operator implementations |
49 | | - ├── glsl/ ........... GLSL compute shaders |
50 | | - │ ├── *.glsl |
51 | | - │ └── conv2d.glsl |
52 | | - └── impl/ ........... C++ code to dispatch GPU compute shaders |
53 | | - ├── *.cpp |
54 | | - └── Conv2d.cpp |
55 | | -``` |
56 | | - |
57 | | -## Features |
58 | | - |
59 | | -The Vulkan delegate currently supports the following features: |
60 | | - |
61 | | -* **Memory Planning** |
62 | | - * Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference. |
63 | | -* **Capability Based Partitioning**: |
64 | | - * A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs |
65 | | -* **Support for upper-bound dynamic shapes**: |
66 | | - * Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering |
67 | | - |
68 | | -In addition to increasing operator coverage, the following features are |
69 | | -currently in development: |
70 | | - |
71 | | -* **Quantization Support** |
72 | | - * We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future. |
73 | | -* **Memory Layout Management** |
74 | | - * Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication. |
75 | | -* **Selective Build** |
76 | | - * We plan to make it possible to control build size by selecting which operators/shaders you want to build with |
77 | | - |
78 | | -## End to End Example |
79 | | - |
80 | | -To further understand the features of the Vulkan Delegate and how to use it, |
81 | | -consider the following end to end example with a simple single operator model. |
82 | | - |
83 | | -### Compile and lower a model to the Vulkan Delegate |
84 | | - |
85 | | -Assuming ExecuTorch has been set up and installed, the following script can be |
86 | | -used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`. |
87 | | - |
88 | | -Once ExecuTorch has been set up and installed, the following script can be used |
89 | | -to generate a simple model and lower it to the Vulkan delegate. |
90 | | - |
91 | | -``` |
92 | | -# Note: this script is the same as the script from the "Setting up ExecuTorch" |
93 | | -# page, with one minor addition to lower to the Vulkan backend. |
94 | | -import torch |
95 | | -from torch.export import export |
96 | | -from executorch.exir import to_edge |
97 | | -
|
98 | | -from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner |
99 | | -
|
100 | | -# Start with a PyTorch model that adds two input tensors (matrices) |
101 | | -class Add(torch.nn.Module): |
102 | | - def __init__(self): |
103 | | - super(Add, self).__init__() |
104 | | -
|
105 | | - def forward(self, x: torch.Tensor, y: torch.Tensor): |
106 | | - return x + y |
107 | | -
|
108 | | -# 1. torch.export: Defines the program with the ATen operator set. |
109 | | -aten_dialect = export(Add(), (torch.ones(1), torch.ones(1))) |
110 | | -
|
111 | | -# 2. to_edge: Make optimizations for Edge devices |
112 | | -edge_program = to_edge(aten_dialect) |
113 | | -# 2.1 Lower to the Vulkan backend |
114 | | -edge_program = edge_program.to_backend(VulkanPartitioner()) |
115 | | -
|
116 | | -# 3. to_executorch: Convert the graph to an ExecuTorch program |
117 | | -executorch_program = edge_program.to_executorch() |
118 | | -
|
119 | | -# 4. Save the compiled .pte program |
120 | | -with open("vk_add.pte", "wb") as file: |
121 | | - file.write(executorch_program.buffer) |
122 | | -``` |
123 | | - |
124 | | -Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate |
125 | | -using the `to_backend()` API. The Vulkan Delegate implements the |
126 | | -`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph |
127 | | -that are supported by the Vulkan delegate, and separates compatible sections of |
128 | | -the model to be executed on the GPU. |
129 | | - |
130 | | -This means the a model can be lowered to the Vulkan delegate even if it contains |
131 | | -some unsupported operators. This will just mean that only parts of the graph |
132 | | -will be executed on the GPU. |
133 | | - |
134 | | - |
135 | | -::::{note} |
136 | | -The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/op_registry.py#L194) |
137 | | -Vulkan partitioner code can be inspected to examine which ops are currently |
138 | | -implemented in the Vulkan delegate. |
139 | | -:::: |
140 | | - |
141 | | -### Build Vulkan Delegate libraries |
142 | | - |
143 | | -The easiest way to build and test the Vulkan Delegate is to build for Android |
144 | | -and test on a local Android device. Android devices have built in support for |
145 | | -Vulkan, and the Android NDK ships with a GLSL compiler which is needed to |
146 | | -compile the Vulkan Compute Library's GLSL compute shaders. |
147 | | - |
148 | | -The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON` |
149 | | -when building with CMake. |
150 | | - |
151 | | -First, make sure that you have the Android NDK installed; any NDK version past |
152 | | -NDK r19c should work. Note that the examples in this doc have been validated with |
153 | | -NDK r28c. The Android SDK should also be installed so that you have access to `adb`. |
154 | | - |
155 | | -The instructions in this page assumes that the following environment variables |
156 | | -are set. |
157 | | - |
158 | | -```shell |
159 | | -export ANDROID_NDK=<path_to_ndk> |
160 | | -# Select the appropriate Android ABI for your device |
161 | | -export ANDROID_ABI=arm64-v8a |
162 | | -# All subsequent commands should be performed from ExecuTorch repo root |
163 | | -cd <path_to_executorch_root> |
164 | | -# Make sure adb works |
165 | | -adb --version |
166 | | -``` |
167 | | - |
168 | | -To build and install ExecuTorch libraries (for Android) with the Vulkan |
169 | | -Delegate: |
170 | | - |
171 | | -```shell |
172 | | -# From executorch root directory |
173 | | -(rm -rf cmake-android-out && \ |
174 | | - pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \ |
175 | | - -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ |
176 | | - -DANDROID_ABI=$ANDROID_ABI \ |
177 | | - -DEXECUTORCH_BUILD_VULKAN=ON \ |
178 | | - -DPYTHON_EXECUTABLE=python \ |
179 | | - -Bcmake-android-out && \ |
180 | | - cmake --build cmake-android-out -j16 --target install) |
181 | | -``` |
182 | | - |
183 | | -### Run the Vulkan model on device |
184 | | - |
185 | | -::::{note} |
186 | | -Since operator support is currently limited, only binary arithmetic operators |
187 | | -will run on the GPU. Expect inference to be slow as the majority of operators |
188 | | -are being executed via Portable operators. |
189 | | -:::: |
190 | | - |
191 | | -Now, the partially delegated model can be executed (partially) on your device's |
192 | | -GPU! |
193 | | - |
194 | | -```shell |
195 | | -# Build a model runner binary linked with the Vulkan delegate libs |
196 | | -cmake --build cmake-android-out --target executor_runner -j32 |
197 | | - |
198 | | -# Push model to device |
199 | | -adb push vk_add.pte /data/local/tmp/vk_add.pte |
200 | | -# Push binary to device |
201 | | -adb push cmake-android-out/executor_runner /data/local/tmp/runner_bin |
202 | | - |
203 | | -# Run the model |
204 | | -adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte |
205 | | -``` |
| 3 | +Please see the [Vulkan Backend Overview](../../docs/source/backends/vulkan/vulkan-overview.md) |
| 4 | +to learn more about the ExecuTorch Vulkan Backend. |
0 commit comments