ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993)#18
Merged
gianni-cor merged 3 commits intoJun 2, 2026
Conversation
Port the Android GPU backend-selection policy from qvac-fabric-llm.cpp's ggml
fork into the speech ggml fork, so the speech stack (whisper / parakeet / tts)
selects GPU backends the same way the LLM stack does.
Problem: ggml_backend_load_all_from_path() loaded Vulkan AND OpenCL
unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan
loads first, so whisper.cpp's default gpu_device=0 lands on the Adreno Vulkan
driver, which SIGSEGVs in vkCmdBindPipeline during ggml compute (observed on
Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g.
Mali) Vulkan is the correct, stable path and OpenCL must not be used.
Fix (mirrors fabric-llm): on Android, after loading Vulkan, use it to detect
the GPU and decide whether to load OpenCL:
- not Adreno -> skip OpenCL (Vulkan/CPU), e.g. Mali keeps Vulkan
- Adreno > 700 -> load OpenCL (Adreno's stable ggml path)
- Adreno 601-700 -> unload Vulkan, CPU only (both backends buggy)
- Adreno <= 600 -> load OpenCL (unchanged fall-through)
Implemented via ggml_backend_reg_by_name("vulkan") +
ggml_backend_min_adreno_version() (using the public ggml_backend_reg_dev_*
/ ggml_backend_dev_description APIs) + ggml_backend_unload().
Off Android the behaviour is unchanged and byte-identical: `load_opencl`
stays true and OpenCL is loaded in exactly the same position as before (right
after virtgpu, before hexagon). The whole Adreno block is `#ifdef __ANDROID__`.
The pure GPU-string parser is extracted to a header (src/ggml-adreno.h,
inline, dependency-free) so it can be unit-tested without a GPU; it also
lowercases the description (a small hardening over fabric-llm's raw substring
check). New tests/test-adreno-version.cpp covers 18 cases (Adreno variants,
Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno, empty, and unparseable);
wired into ctest as test-adreno-version (always built, no ggml link).
Validation (x64-linux):
- ggml lib builds clean (ggml-backend-reg.cpp.o)
- ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only: clean (Android block compiles)
- test-adreno-version: 18/18 pass via ctest
The Adreno runtime selection itself is validated end-to-end on the device
farm via transcription-whispercpp's GPU test (Samsung S25 -> OpenCL).
Co-authored-by: Cursor <cursoragent@cursor.com>
…review)
Code-review follow-up on the Android Adreno backend-selection change:
- Factor the OpenCL/Vulkan decision (the 700/600 generation thresholds) out of
ggml_backend_load_all_from_path() into ggml_adreno_resolve_backend_policy()
in ggml-adreno.h, returning {load_opencl, unload_vulkan}. This makes the
policy boundaries unit-testable without a GPU; the loader now just consumes
the policy (behaviour unchanged, still mirrors qvac-fabric-llm.cpp).
- Add 11 policy cases to test-adreno-version.cpp, pinning the subtle boundaries:
Adreno exactly 700 -> CPU only (not > 700), exactly 600 -> OpenCL (not > 600),
701/730/750/830 -> OpenCL, 601..700 -> CPU only, <=600 -> OpenCL, <=0 -> none.
- Document the parser's first-digit-run limitation (inherited from fabric-llm):
"Adreno X1-85" would parse as 1; that naming is Snapdragon-X (Windows-on-ARM)
only, not Android phones (5xx/6xx/7xx/8xx parse correctly).
29/29 test cases pass via ctest; ggml builds clean; ggml-backend-reg.cpp
-D__ANDROID__ -fsyntax-only clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
Per review: only Adreno > 700 has a stable ggml GPU path, so collapse the
Adreno <=600 tier into the 601..700 tier — all Adreno 1..700 now resolves to
CPU only (unload Vulkan, no OpenCL). This is stricter than qvac-fabric-llm.cpp
(which loaded OpenCL on Adreno <=600); older Adreno GPUs are no more capable
than the 601..700 tier we already exclude, so there is no reason to expose a
GPU backend on them.
ggml_adreno_resolve_backend_policy() simplifies to two GPU branches (>700 ->
OpenCL, 1..700 -> CPU only) plus the non-Adreno case. Test updated: Adreno
600/500/1 now expect {load_opencl=false, unload_vulkan=true}.
29/29 cases pass; ggml builds clean; -D__ANDROID__ -fsyntax-only clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
ishanvohra2
approved these changes
Jun 2, 2026
freddy311082
approved these changes
Jun 2, 2026
gianni-cor
approved these changes
Jun 2, 2026
This was referenced Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports the Android GPU backend-selection policy from
qvac-fabric-llm.cpp's ggml fork (the LLM stack) into the speech ggml fork, so whisper / parakeet / tts select GPU backends the same way the LLM stack does.Problem
ggml_backend_load_all_from_path()loaded Vulkan and OpenCL unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan loads first, so whisper.cpp's defaultgpu_device=0lands on the Adreno Vulkan driver, which SIGSEGVs invkCmdBindPipelineduring ggml compute (observed on Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g. Mali, proven working on Pixel 9), Vulkan is the correct/stable path and OpenCL must not be used.Fix
On Android, after loading Vulkan, use it to detect the GPU and decide whether to load OpenCL:
{load_opencl, unload_vulkan}{false, false}{true, false}{false, true}Only Adreno 700+ has a stable ggml GPU path. This is stricter than fabric-llm, which loads OpenCL on Adreno ≤ 600 — we treat ≤ 600 the same as 601–700 (CPU only), since older Adrenos are no more capable than the 601–700 tier we already exclude.
Implemented with
ggml_backend_reg_by_name("vulkan")+ggml_backend_min_adreno_version()(publicggml_backend_reg_dev_*/ggml_backend_dev_descriptionAPIs) +ggml_backend_unload(). The threshold decision is factored into a pureggml_adreno_resolve_backend_policy()so it's unit-testable.Regression safety
The entire Adreno block is
#ifdef __ANDROID__. Off Android the behaviour is byte-identical:load_openclstaystrueand OpenCL is loaded in exactly the same position as before (aftervirtgpu, beforehexagon).Tests
src/ggml-adreno.h(inline, dependency-free); the parser also lowercases the description (small hardening over fabric-llm's raw substring check).tests/test-adreno-version.cpp— 29 cases: 18 parser (Adreno variants; Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno; empty; unparseable) + 11 policy, incl. the> 700boundary (Adreno exactly 700 → CPU only) and the merged ≤ 600 tier (600/500/1 → CPU only). Wired into ctest astest-adreno-version(always built, no ggml link).Test plan
ggmllib builds clean (x64-linux)ggml-backend-reg.cpp-D__ANDROID__ -fsyntax-only: clean (Android block compiles)test-adreno-version: 29/29 pass via ctestRelationship to transcription-whispercpp #2343 (complementary — both required)
This change controls which backends are available; the addon picks among them (same split as the LLM stack: engine availability +
BackendSelection). On Adreno 700+ this keeps both Vulkan and OpenCL loaded (like fabric-llm keeps Vulkan), so whisper's defaultgpu_device=0would still pick Vulkan → crash unless #2343'sadrenoOpenclGpuDeviceIndex→gpu_deviceguard redirects to OpenCL. So #2343's guard is not redundant; #18 additionally handles the Adreno 1–700 (CPU-only) tier and the non-Adreno OpenCL-skip that the addon guard can't.Downstream (separate, follow-up)
Once merged: bump the
ggml-speechvcpkg port →whisper-cpp(+ tts-cpp / parakeet-cpp) deps → validate via thetranscription-whispercppaddon overlay → CI → registry → addon, per the standard merge protocol.