Cross-platform & energy-efficient kernels, runtime and AI inference engine for mobile devices.
Cactus Graph is a general numerical computing framework for implementing any model, like PyTorch for mobile devices.
#include cactus.h
CactusGraph graph;
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);
auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);
float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);
graph.execute();
void* output_data = graph.get_output(result);
graph.hard_reset();
Cactus Engine is an AI inference engine with OpenAI-compatible APIs built on top of Cactus Graphs.
#include cactus.h
cactus_model_t model = cactus_init("path/to/weight/folder", 2048);
const char* messages = R"([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Henry Ndubuaku"}
])";
const char* options = R"({
"max_tokens": 50,
"stop_sequences": ["<|im_end|>"]
})";
char response[1024];
int result = cactus_complete(model, messages, response, sizeof(response), options, nullptr, nullptr, nullptr);Example response from Gemma3-270m-INT8
{
"success": true,
"response": "Hi there! I'm just a friendly assistant.",
"time_to_first_token_ms": 45.23,
"total_time_ms": 163.67,
"tokens_per_second": 168.42,
"prefill_tokens": 28,
"decode_tokens": 50,
"total_tokens": 78
}- Models: LFM2-VL-450m & Whisper-Small
- Decode = toks/sec, P/D = prefill/decode, VLM = 256×256 image, STT = 30s audio
- INT4 coming: 1.8x speed, 1.9x smaller files
- NPU coming: 5-11x prefill, energy efficiency
| Device | Decode | 1k-P/D | 4k-P/D | 4k-RAM | VLM-TTFT | VLM-Dec | VLM-RAM | STT-TTFT | STT-Dec | STT-RAM |
|---|---|---|---|---|---|---|---|---|---|---|
| Mac M4 Pro | 173 | 1574/115 | 1089/100 | 122MB | 0.38s | 168 | 112MB | 1.7s | 83 | 142MB |
| Mac M3 Pro | 150 | 1540/109 | 890/93 | 121MB | 0.47s | 149 | 113MB | 2.9s | 78 | 140MB |
| iPad/Mac M4 | 129 | 793/82 | 507/64 | 80MB | 0.46s | 113 | 145MB | 2.4s | 60 | 131MB |
| iPad/Mac M3 | 112 | 786/78 | 446/60 | 81MB | 0.58s | 111 | 154MB | 4.2s | 58 | 142MB |
| iPhone 17 Pro | 136 | 810/105 | 628/84 | - | 1.1s | 120 | - | - | - | - |
| iPhone 16 Pro | 114 | 716/98 | 580/81 | - | 1.3s | 101 | - | 3.5s | 75 | - |
| iPhone 15 Pro | 99 | 549/86 | 530/75 | - | 1.5s | 92 | - | 3.8s | 70 | - |
| Galaxy S25 Ultra | 91 | 230/63 | 173/57 | 128MB | 1.4s | 58 | - | - | - | - |
| Nothing 3 | 56 | 167/49 | 160/46 | - | 1.7s | 54 | - | 8.5s | 55 | - |
| Nothing 3a | 31 | 114/26 | 108/24 | - | 2.4s | 29 | - | - | - | - |
| Raspberry Pi 5 | 24 | 192/28 | - | - | 2.3s | 23 | - | 21s | 16 | - |
Dependencies will be setup on first run automatically.
cli/cactus --help # to see all commands
cli/cactus run LiquidAI/LFM2-VL-450M # to interact with a model
cli/cactus test # to run unit tests during dev + reproduce benchmarks
cli/cactus download Qwen/Qwen3-0.6B # HF name, stored to weights/Qwen3-0.6B| Model | Compressed Size | Completion | Tool Call | Vision | Embed | Speech |
|---|---|---|---|---|---|---|
| google/gemma-3-270m-it | 172MB | ✓ | ✗ | ✗ | ✗ | ✗ |
| openai/whisper-small | 210MB | ✗ | ✗ | ✗ | ✓ | ✓ |
| LiquidAI/LFM2-350M | 233MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| HuggingFaceTB/SmolLM2-360m-Instruct | 227MB | ✓ | ✗ | ✗ | ✗ | ✗ |
| LiquidAI/LFM2-VL-450M | 420MB | ✓ | ✗ | ✓ | ✓ | ✗ |
| Qwen/Qwen3-0.6B | 394MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| Qwen/Qwen3-Embedding-0.6B | 394MB | ✗ | ✗ | ✗ | ✓ | ✗ |
| LiquidAI/LFM2-700M | 467MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| nomic-ai/nomic-embed-text-v2-moe | 533MB | ✗ | ✗ | ✗ | ✓ | ✗ |
| google/gemma-3-1b-it | 642MB | ✓ | ✗ | ✗ | ✗ | ✗ |
| openai/whisper-medium | 646MB | ✗ | ✗ | ✗ | ✓ | ✓ |
| LiquidAI/LFM2-1.2B | 722MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| LiquidAI/LFM2-1.2B-RAG | 722MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| LiquidAI/LFM2-1.2B-Tools | 722MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| LiquidAI/LFM2-VL-1.6B | 1440MB | ✓ | ✗ | ✓ | ✓ | ✗ |
| Qwen/Qwen3-1.7B | 1161MB | ✓ | ✓ | ✗ | ✓ | ✗ |
| HuggingFaceTB/SmolLM2-1.7B-Instruct | 1161MB | ✓ | ✗ | ✗ | ✓ | ✗ |
android/build.sh # generate the `libcactus.so` and `libcactus.a` for android
apple/build.sh # generate the `.xcframeworks` for AppleOr simply use the provided SDKs
# Needs C++, Python and MySys with Pacman, then install CMake and Python dependencies weight convertion dependencies
pacman -S mingw-w64-clang-aarch64-cmake mingw-w64-clang-aarch64-toolchain mingw-w64-clang-aarch64-mman-win32
pip3 install -r tools/requirements.txt
tests/run.bat for Windows ARM