MLC-LLM for Orange Pi 5 Pro

Sources:

Please open an issue if you find something that can be improved in this guide

Prepare the hardware

While this guide is tailored to the Orange Pi 5 Pro it could also be used as a generic guide for other boards using rockchip. Make sure your hardware is properly cooled with a fan and heatsinks as the hardware may overheat or thermal throttle.

Install Ubuntu for Rockchip

Download the latest version of ubuntu for your device here
Burn the OS image to a micro SD card using balenaetcher or any other image writing software
Boot the OS on your device. Optionally connect via SSH for convivence. The default username and password is ubuntu/ubuntu. Once you login for the first time you will be forced to change the password

Install OS to SSD or eMMC (Optional but recommended)

In order to not experience disk read/write speed bottlenecks when using mlc-llm I highly recommend using a M.2 SSD or eMMC storage module.

Once booted to the OS off the micro SD card run

sudo fdisk -l

to find the storage you will be installing the OS on (Highlighted in red is the eMMC module I will be installing my OS on)

Once you found the storage to install the OS on run the following command

  sudo ubuntu-rockchip-install /dev/mmcblk0

Where "/dev/mmcblk0" is the storage you wish to install the OS on as shown by fdisk. After the process is done you will need to power off the device, remove the micro SD card and turn the system back on.

Install OpenCL GPU Drivers

Download and install libmali-g610.so

  cd /usr/lib && sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-x11-wayland-gbm.so

Check if file mali_csffw.bin exist under path /lib/firmware, if not download it with command:

  cd /lib/firmware && sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/firmware/g610/mali_csffw.bin

Download OpenCL ICD loader and manually add libmali to ICD

  sudo apt update
sudo apt install mesa-opencl-icd
sudo mkdir -p /etc/OpenCL/vendors
echo "/usr/lib/libmali-valhall-g610-g6p0-x11-wayland-gbm.so" | sudo tee /etc/OpenCL/vendors/mali.icd

Download and install libOpenCL

  sudo apt install ocl-icd-opencl-dev

Download and install dependencies for Mali OpenCL

  sudo apt install libxcb-dri2-0 libxcb-dri3-0 libwayland-client0 libwayland-server0 libx11-xcb1

Download and install clinfo to check if OpenCL successfully installed

  sudo apt install clinfo

Run clinfo and check the results. We are looking for a mali device like shown below.

Install Conda - miniconda (optional but recommended)

While not necessary using conda will make managing dependencies for your build environment much easier.

Grab the latest installer script for your hardware here for the Orange Pi 5 Pro I will be using "Miniconda3 Linux-aarch64 64-bit" as the device is using an arm64 CPU.
Run the following, change the installer script url accordingly

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh

add/initialize conda in your shell for bash:

~/miniconda3/bin/conda init bash

for zsh:

~/miniconda3/bin/conda init zsh

close and restart your shell to activate and start using conda. You should see (base) next to your username
Update conda

conda update --yes -n base -c defaults conda

Set libmamba as the dependency solver. The default dependency solver in conda could be slow in certain scenarios, and it is always recommended to upgrade it to libmamba, a faster solver. Install:

conda install --yes -n base conda-libmamba-solver

set as default dependency solver:

conda config --set solver libmamba

Compile TVM-Unity (Optional but recommended)

See instructions here for up to date instructions: https://llm.mlc.ai/docs/install/tvm.html

mlc-llm provides pre-compiles tvm packages which should be fine for our uses but in my personal testing I was unable to get them to work (specifically on the orange pi 5 pro). If you skip this section and encounter errors related to the "tvm" module I recommend attempting to compile it yourself.

Some prerequisites

Since these packages are required when building mlc-llm I install them with apt install but you could optionally install them to your build environment with conda. Either way make sure these packages are install prior to attempting to compile TVM-Unity

Doxygen
tqdm
Graphviz
build-essential (ubuntu specific package. Install the equivalent for your OS)
zlib1g-dev
libfl-dev
clang

PIP prerequisites

The following python modules are required in order for us to run mlc-llm and test our tvm-unity build. If you are not using conda you will need to install and use python3-pip

numpy
decorator
psutil
typing_extensions
packaging
attrs

Using conda

The following commands are for setting up a build environment for compiling TVM-Unity. If you skipped the conda installation you can attempt to install these dependencies using apt install. If manually installing these packages with apt install make sure you are installing the correct versions as by default it will install the latest version of each package.

Run the following to create and setup the conda environment

# make sure to start with a fresh environment
conda env remove -n tvm-build-venv
# create the conda environment with build dependency
conda create -n tvm-build-venv -c conda-forge \
    "llvmdev>=15" \
    "cmake>=3.24" \
    git \
    python=3.11
# enter the build environment
conda activate tvm-build-venv

Clone the github repo for tvm-unity (actually mlc-ai/relax temporarily, check link above to see if tvm-unity is recommended now) I recommend cloning in the home directory (~)

# clone from GitHub
git clone --recursive https://github.com/mlc-ai/relax.git tvm-unity && cd tvm-unity
# create the build directory
rm -rf build && mkdir build && cd build
# specify build requirements in `config.cmake`
cp ../cmake/config.cmake .

configure the cmake.config file which instructs cmake how we want to compile the project. Ensure you are in the build directory created earlier when running these commands.

echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
echo "set(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")" >> config.cmake
echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake

Set how mlc-llm will run the llm. The options are Cuda (Nvidia), Metal (Apple), Vulkan (AMD) and OpenCL which is what we will be using. If you are using something other than OpenCL make sure you set that to "ON"

echo "set(USE_OPENCL ON)" >> config.cmake

once the config.cmake file is created and configured to our liking we can run the following to build the project. Again ensure you are in the build folder from earlier:

cmake .. && cmake --build . --parallel $(nproc)

Verify TVM compiled correct

You may see several warnings or errors during compilation but this should be ok as long as the build doesn't end with several error messages like the screenshot below which is due to missing libfl-dev

if you encounter any errors such as the error shown above remediate the errors by installing the required packages and restart the process including and following this command in the guide above. Make sure you are in the tvm-unity directory when running the command below.

rm -rf build && mkdir build && cd build

Example of output when built successfully:

Add the tvm-unity path to your python path variable. Edit the path if your tvm-unity folder is located somewhere else:

export PYTHONPATH=/home/ubuntu/tvm-unity/python:$PYTHONPATH

After adding the path to your python path variable run the following. Make sure you have the python modules installed from the pip prerequisites section!

python -c "import tvm; print(tvm.__file__)"

which should output something similar to:

Now run the following to check the build options used:

python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"

We should then be able to see opencl set to "ON"

Compile mlc-llm from source

Package prerequisites

There is only one other package required here compared to the tvm-unity section. If you installed the packages included in the tvm-unity section then you only need to install git-lfs

git-lfs
Doxygen
tqdm
Graphviz
build-essential (ubuntu specific package. Install the equivalent for your OS)
zlib1g-dev
libfl-dev
clang

PIP prerequisites

The following python modules are required in order for us to run mlc-llm. If you are not using conda you will need to install and use python3-pip.

numpy
decorator
psutil
typing_extensions
packaging
attrs
Pydantic
shortuuid
fastapi
requests
tqdm
prompt_toolkit
uvicorn

Conda environment (optional but recommended)

If you have conda installed from earlier perform the following to setup a build environment for mlc-llm. Note that if you aren't using conda you need to have the packages installed on your OS with the recommended versions.

Create the environment

conda create -n mlc-chat-venv -c conda-forge \
    "cmake>=3.24" \
    rust \
    git \
    python=3.11

activate the environment

conda activate mlc-chat-venv

Clone and compile

clone the mlc-llm github repo. I recommend cloning it to your home directory to make things easy:

git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm/

create the build directory

mkdir -p build && cd build

generate the config.cmake with the following script. Note if you are not using conda you will likely have to replace "python" with "python3"

python ../cmake/gen_cmake_config.py

for the first option of the script, the tvm location, if you followed the compile tvm-unity instructions from earlier you should use that directory here. Otherwise you can attempt to use the default provided and just leave this option blank. Here in the example below I am using a custom compiled tvm
Use CUDA should be no
Use CUTLASS should be no
Use CUBLAS should be no
Use ROCm should be no
Use Vulkan should be no
Use Metal should be no
Use OpenCL should be Yes
Use OpenCLHostPtr can be yes or no. From what I understand this has something to do with using storage for opencl cache of some kind but dont have a great understanding of what exactly this means/does.
Use FlashInfer should be no Here is my output after running the script:

Now you are ready to compile. Run the following to build the project. Again keep an eye out for errors and warnings . Warnings and errors may not prevent the project from compiling

cmake .. && cmake --build . --parallel $(nproc) && cd ..

Here is my output after a successful build:

After successfully building lets add the mlc-llm path to our pythonpath variable. Change the following to match your paths if you used something different:

export MLC_LLM_SOURCE_DIR=/home/ubuntu/mlc-llm
export PYTHONPATH=$MLC_LLM_SOURCE_DIR/python:$PYTHONPATH

And since mlc-llm is ran within python we will make an alias for calling it:

alias mlc_llm="python -m mlc_llm"

now provided you have all the pre-requisites installed you can run the following to test mlc-llm:

python -c "import mlc_llm; print(mlc_llm)"

if successful you should see something along the lines of:

<module 'mlc_llm' from '/home/ubuntu/mlc-llm/python/mlc_llm/__init__.py'>

Running mlc-llm

To open the command help information run the following:

mlc_llm chat -h

Selecting a device

When attempting to run mlc-llm using opencl on the orange pi 5 pro I found that it would default to a generic ARM opencl device rather than the Mali GPU which we setup drivers for earlier in this guide. Because of this I had to specify the device to run mlc-llm on otherwise I would encounter an error preventing mlc-llm from running. I believe the order of the devices is reflected by clinfo but this may just be a coincidence in my case. See examples of specifying the device in the command examples below

Getting pre-compiled LLMs

To download and utilize some pre-comipled LLM models for mlc-llm we can visit the mlc-ai organization on huggingface https://huggingface.co/mlc-ai Available quantization codes are: q3f16_0, q4f16_1, q4f16_2, q4f32_0, q0f32, and q0f16

for testing I will be using SmolLM-1.7B-Instruct-q4f16_1-MLC as its a pretty small download and I've found it runs decent. To run it as a chat run the following (note that in the future you may need to select a different model or an updated version of this model):

mlc_llm chat HF://mlc-ai/SmolLM-1.7B-Instruct-q4f16_1-MLC --device opencl:1

If you wanted to run the model as a server you would do the following:

mlc_llm serve HF://mlc-ai/SmolLM-1.7B-Instruct-q4f16_1-MLC --device opencl:1 --mode server --host 0.0.0.0

This will run mlc-llm as a REST server on opencl device 1 with server mode specified and the host changed from 127.0.0.1 to allow access outside of localhost

You can test if the server is accessible outside of localhost using curl or invoke-restmethod on windows

Extras

Adding variables to your shell configuration

To make the variables and alias we added earlier permanent we can add them to our bash shell config. Note this is unique to the user and not systemwide: Modify your shell config (use your text editor of choice):

sudo nano ~/.bashrc

At the bottom of the file add the following and change accordingly based on the paths you used and if you compiled tvm-unity:

export PYTHONPATH=/home/ubuntu/mlc-llm/python:/home/ubuntu/tvm-unity/python
alias mlc_llm="python -m mlc_llm"

Once saved you need to reload your shell in order for the variables to be loaded

Compiling your own models

See the bottom of the following guide for basic instructions on building your own model: https://github.com/Chrisz236/llm-rk3588

Using mlc-llm with HomeAssistant

At the time of writing this HomeAssistant has been adding lots of support for LLMs to utilize as a "Conversation agent" within a custom voice assistant. This allows mlc-llm to act as the brains for whatever you are requesting via your HomeAssistant voice assistant. Currently there is no official integration which allows using mlc-llm with HomeAssistant but there is a custom integration which we can use thanks to mlc-llm Open-AI REST compatibility https://github.com/jekalmin/extended_openai_conversation

The Extended Open-AI conversation integration is a fork of the official Open-AI integration but allows for specifying a custom server address etc. Here is an example of the config options I used when using mlc-llm with HomeAssistant:

Also make sure you have the correct chat model selected. I entered the model as shown by the mlc-llm server when querying its models.

Do not expect this to perform well. The config options of this extension are vague and have no documentation so I am not sure if there is anything in HomeAssistant that can be tweaked for better performance. I expect we will need an official implementation of Open-AI compatible servers before we see better performance. This integration also attempts to use functions rather than tools for controlling entities in HomeAssistant, I haven't had much luck using either and I expect more work is needed to fine tune this.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLC-LLM for Orange Pi 5 Pro

Prepare the hardware

Install Ubuntu for Rockchip

Install OS to SSD or eMMC (Optional but recommended)

Install OpenCL GPU Drivers

Install Conda - miniconda (optional but recommended)

Compile TVM-Unity (Optional but recommended)

Some prerequisites

PIP prerequisites

Using conda

Verify TVM compiled correct

Compile mlc-llm from source

Package prerequisites

PIP prerequisites

Conda environment (optional but recommended)

Clone and compile

Running mlc-llm

Selecting a device

Getting pre-compiled LLMs

Extras

Adding variables to your shell configuration

Compiling your own models

Using mlc-llm with HomeAssistant

About

Releases

Packages

License

serialscriptr/Orange-PI-5-Pro-MLC-LLM

Folders and files

Latest commit

History

Repository files navigation

MLC-LLM for Orange Pi 5 Pro

Prepare the hardware

Install Ubuntu for Rockchip

Install OS to SSD or eMMC (Optional but recommended)

Install OpenCL GPU Drivers

Install Conda - miniconda (optional but recommended)

Compile TVM-Unity (Optional but recommended)

Some prerequisites

PIP prerequisites

Using conda

Verify TVM compiled correct

Compile mlc-llm from source

Package prerequisites

PIP prerequisites

Conda environment (optional but recommended)

Clone and compile

Running mlc-llm

Selecting a device

Getting pre-compiled LLMs

Extras

Adding variables to your shell configuration

Compiling your own models

Using mlc-llm with HomeAssistant

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages