LLama.cpp GPU Support on Android Device #16606
Siddhesh2377
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Here's an optimized version of your announcement:
GPU Acceleration for Android llama.cpp via OpenCL - Working Implementation
I've successfully implemented GPU acceleration for llama.cpp on Android using OpenCL, specifically optimized for Qualcomm Adreno GPUs. This implementation uses the existing llama.cpp repository without any modifications or custom patches.
Repository: https://github.com/Siddhesh2377/Ai-Core
Performance Results:
The screenshots demonstrate significant performance improvements with GPU offloading enabled via OpenCL on Snapdragon hardware.
Implementation Details:
Architecture:
The implementation is packaged as a single .aar library that provides:
Build Configuration:
Key CMake flags for OpenCL:
GGML_OPENCL=ON
GGML_VULKAN=OFF
OpenCL headers and runtime linking configured for Android NDK build system. The implementation uses the Qualcomm-optimized OpenCL backend that was recently merged into llama.cpp.
Testing:
Tested on Qualcomm Adreno GPU with measurable performance improvements over CPU-only mode. Token generation speed increases significantly with full layer offloading.
Note: This uses the standard llama.cpp repository. All OpenCL support is already present in the upstream codebase - this is purely an Android integration implementation demonstrating how to properly configure and build it for mobile devices.
Feel free to adjust any technical details based on your specific measurements or implementation choices.
Beta Was this translation helpful? Give feedback.
All reactions