GitHub - cactus-compute/cactus: Kernels & AI inference engine for mobile devices.

Cross-platform & energy-efficient kernels, runtime and AI inference engine for mobile devices.

Cactus Graph

Cactus Graph is a general numerical computing framework for implementing any model, like PyTorch for mobile devices.

#include cactus.h

CactusGraph graph;
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);

auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);

float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);

graph.execute();
void* output_data = graph.get_output(result);

graph.hard_reset();

Cactus Engine

Cactus Engine is an AI inference engine with OpenAI-compatible APIs built on top of Cactus Graphs.

#include cactus.h

cactus_model_t model = cactus_init("path/to/weight/folder", 2048);

const char* messages = R"([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "My name is Henry Ndubuaku"}
])";

const char* options = R"({
    "max_tokens": 50,
    "stop_sequences": ["<|im_end|>"]
})";

char response[1024];
int result = cactus_complete(model, messages, response, sizeof(response), options, nullptr, nullptr, nullptr);

Example response from Gemma3-270m-INT8

{
    "success": true,
    "response": "Hi there! I'm just a friendly assistant.",
    "time_to_first_token_ms": 45.23,
    "total_time_ms": 163.67,
    "tokens_per_second": 168.42,
    "prefill_tokens": 28,
    "decode_tokens": 50,
    "total_tokens": 78
}

INT8 CPU-ONLY Performance

_{Models: LFM2-VL-450m & Whisper-Small}
_{Decode = toks/sec, P/D = prefill/decode, VLM = 256×256 image, STT = 30s audio}
_{INT4 coming: 1.8x speed, 1.9x smaller files}
_{NPU coming: 5-11x prefill, energy efficiency}

Device	Decode	1k-P/D	4k-P/D	4k-RAM	VLM-TTFT	VLM-Dec	VLM-RAM	STT-TTFT	STT-Dec	STT-RAM
Mac M4 Pro	173	1574/115	1089/100	122MB	0.38s	168	112MB	1.7s	83	142MB
Mac M3 Pro	150	1540/109	890/93	121MB	0.47s	149	113MB	2.9s	78	140MB
iPad/Mac M4	129	793/82	507/64	80MB	0.46s	113	145MB	2.4s	60	131MB
iPad/Mac M3	112	786/78	446/60	81MB	0.58s	111	154MB	4.2s	58	142MB
iPhone 17 Pro	136	810/105	628/84	-	1.1s	120	-	-	-	-
iPhone 16 Pro	114	716/98	580/81	-	1.3s	101	-	3.5s	75	-
iPhone 15 Pro	99	549/86	530/75	-	1.5s	92	-	3.8s	70	-
Galaxy S25 Ultra	91	230/63	173/57	128MB	1.4s	58	-	-	-	-
Nothing 3	56	167/49	160/46	-	1.7s	54	-	8.5s	55	-
Nothing 3a	31	114/26	108/24	-	2.4s	29	-	-	-	-
Raspberry Pi 5	24	192/28	-	-	2.3s	23	-	21s	16	-

Using up this repo on Mac

Dependencies will be setup on first run automatically.

cli/cactus --help # to see all commands
cli/cactus run LiquidAI/LFM2-VL-450M # to interact with a model
cli/cactus test # to run unit tests during dev + reproduce benchmarks
cli/cactus download Qwen/Qwen3-0.6B # HF name, stored to weights/Qwen3-0.6B

Supported models (INT8)

Model	Compressed Size	Completion	Tool Call	Vision	Embed	Speech
google/gemma-3-270m-it	172MB	✓	✗	✗	✗	✗
openai/whisper-small	210MB	✗	✗	✗	✓	✓
LiquidAI/LFM2-350M	233MB	✓	✓	✗	✓	✗
HuggingFaceTB/SmolLM2-360m-Instruct	227MB	✓	✗	✗	✗	✗
LiquidAI/LFM2-VL-450M	420MB	✓	✗	✓	✓	✗
Qwen/Qwen3-0.6B	394MB	✓	✓	✗	✓	✗
Qwen/Qwen3-Embedding-0.6B	394MB	✗	✗	✗	✓	✗
LiquidAI/LFM2-700M	467MB	✓	✓	✗	✓	✗
nomic-ai/nomic-embed-text-v2-moe	533MB	✗	✗	✗	✓	✗
google/gemma-3-1b-it	642MB	✓	✗	✗	✗	✗
openai/whisper-medium	646MB	✗	✗	✗	✓	✓
LiquidAI/LFM2-1.2B	722MB	✓	✓	✗	✓	✗
LiquidAI/LFM2-1.2B-RAG	722MB	✓	✓	✗	✓	✗
LiquidAI/LFM2-1.2B-Tools	722MB	✓	✓	✗	✓	✗
LiquidAI/LFM2-VL-1.6B	1440MB	✓	✗	✓	✓	✗
Qwen/Qwen3-1.7B	1161MB	✓	✓	✗	✓	✗
HuggingFaceTB/SmolLM2-1.7B-Instruct	1161MB	✓	✗	✗	✓	✗

Resources

Using in your apps

android/build.sh # generate the `libcactus.so` and `libcactus.a` for android
apple/build.sh # generate the `.xcframeworks` for Apple

Or simply use the provided SDKs

Try demo apps

Windows ARM PC setup

# Needs C++, Python and MySys with Pacman, then install CMake and Python dependencies weight convertion dependencies 
pacman -S mingw-w64-clang-aarch64-cmake mingw-w64-clang-aarch64-toolchain mingw-w64-clang-aarch64-mman-win32
pip3 install -r tools/requirements.txt
tests/run.bat for Windows ARM

Name		Name	Last commit message	Last commit date
Latest commit History 373 Commits
.githooks		.githooks
.github/workflows		.github/workflows
android		android
apple		apple
assets		assets
cactus		cactus
cli		cli
docs		docs
libs		libs
tests		tests
tools		tools
tuning		tuning
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DCO.md		DCO.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cactus Graph

Cactus Engine

INT8 CPU-ONLY Performance

Using up this repo on Mac

Supported models (INT8)

Resources

Using in your apps

Try demo apps

Windows ARM PC setup

About

Uh oh!

Releases 5

Packages

Contributors 31

Languages

License

cactus-compute/cactus

Folders and files

Latest commit

History

Repository files navigation

Cactus Graph

Cactus Engine

INT8 CPU-ONLY Performance

Using up this repo on Mac

Supported models (INT8)

Resources

Using in your apps

Try demo apps

Windows ARM PC setup

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 31

Languages

Packages