Vitis-AI is Xilinx's development stack for hardware-accelerated AI inference on Xilinx platforms, including both edge devices and Alveo cards. It consists of optimized IP, tools, libraries, models, and example designs. It is designed with high efficiency and ease of use in mind, unleashing the full potential of AI acceleration on Xilinx FPGA and ACAP.
The current Vitis-AI execution provider inside ONNXRuntime enables acceleration of Neural Network model inference using DPUv1. DPUv1 is a hardware accelerator for Convolutional Neural Networks (CNN) on top of the Xilinx Alveo platform and targets U200 and U250 accelerator cards.
On this page you will find information on how to build ONNXRuntime with Vitis-AI and on how to get started with an example.
For building ONNXRuntime with the Vitis-AI execution provider, you will have to setup the hardware environment and build the docker, see build steps.
The following table lists system requirements for running docker containers as well as Alveo cards.
Component | Requirement |
---|---|
Motherboard | PCI Express 3.0-compliant with one dual-width x16 slot |
System Power Supply | 225W |
Operating System | Ubuntu 16.04, 18.04 |
CentOS 7.4, 7.5 | |
RHEL 7.4, 7.5 | |
CPU | Intel i3/i5/i7/i9/Xeon 64-bit CPU |
GPU (Optional to accelerate quantization) | NVIDIA GPU with a compute capability > 3.0 |
CUDA Driver (Optional to accelerate quantization) | nvidia-410 |
FPGA | Xilinx Alveo U200 or U250 |
Docker Version | 19.03.1 |
-
Clone the Vitis AI repository:
git clone https://github.com/xilinx/vitis-ai
-
Install the Docker, and add the user to the docker group. Link the user to docker installation instructions from the following docker's website:
-
Any GPU instructions will have to be separated from Vitis AI.
-
Set up Vitis AI to target Alveo cards. To target Alveo cards with Vitis AI for machine learning workloads, you must install the following software components:
- Xilinx Runtime (XRT)
- Alveo Deployment Shells (DSAs)
- Xilinx Resource Manager (XRM) (xbutler)
- Xilinx Overlaybins (Accelerators to Dynamically Load - binary programming files)
While it is possible to install all of these software components individually, a script has been provided to automatically install them at once. To do so:
- Run the following commands:
cd Vitis-AI/alveo/packages sudo su ./install.sh
- Power cycle the system.
-
Build and start the ONNXRuntime Vitis-AI Docker Container.
cd {onnxruntime-root}/dockerfiles docker build -t onnxruntime-vitisai -f Dockerfile.vitisai . ./scripts/docker_run_vitisai.sh
Setup inside container
source /opt/xilinx/xrt/setup.sh conda activate vitis-ai-tensorflow
Usually, to be able to accelerate inference of Neural Network models with Vitis-AI DPU accelerators, those models need to quantized upfront. In the ONNXRuntime Vitis-AI execution provider we make use of on-the-fly quantization to remove this additional preprocessing step. In this flow, one doesn't need to quantize his/her model upfront but can make use of the typical inference execution calls (InferenceSession.run) to quantize the model on-the-fly using the first N inputs that are provided (see more information below). This will set up and calibrate the Vitis-AI DPU and from that point onwards inference will be accelerated for all next inputs.
A couple of environment variables can be used to customize the Vitis-AI execution provider.
Environment Variable | Default if unset | Explanation |
---|---|---|
PX_QUANT_SIZE | 128 | The number of inputs that will be used for quantization (necessary for Vitis-AI acceleration) |
PX_BUILD_DIR | Use the on-the-fly quantization flow | Loads the quantization and compilation information from the provided build directory and immediately starts Vitis-AI hardware acceleration. This configuration can be used if the model has been executed before using on-the-fly quantization during which the quantization and comilation information was cached in a build directory. |
When using python, you can base yourself on the following example:
# Import pyxir before onnxruntime
import pyxir
import pyxir.frontend.onnx
import pyxir.contrib.dpuv1.dpuv1
import onnxruntime
# Add other imports
# ...
# Load inputs and do preprocessing
# ...
# Create an inference session using the Vitis-AI execution provider
session = onnxruntime.InferenceSession('[model_file].onnx', None,["VitisAIExecutionProvider"])
# First N (default = 128) inputs are used for quantization calibration and will
# be executed on the CPU
# This config can be changed by setting the 'PX_QUANT_SIZE' (e.g. export PX_QUANT_SIZE=64)
imput_name = [...]
outputs = [session.run([], {input_name: calib_inputs[i]})[0] for i in range(128)]
# Afterwards, computations will be accelerated on the FPGA
input_data = [...]
result = session.run([], {input_name: input_data})