ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993) by Zbig9000 · Pull Request #18 · tetherto/qvac-ext-ggml

Zbig9000 · 2026-06-02T08:58:50Z

Summary

Ports the Android GPU backend-selection policy from qvac-fabric-llm.cpp's ggml fork (the LLM stack) into the speech ggml fork, so whisper / parakeet / tts select GPU backends the same way the LLM stack does.

Problem

ggml_backend_load_all_from_path() loaded Vulkan and OpenCL unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan loads first, so whisper.cpp's default gpu_device=0 lands on the Adreno Vulkan driver, which SIGSEGVs in vkCmdBindPipeline during ggml compute (observed on Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g. Mali, proven working on Pixel 9), Vulkan is the correct/stable path and OpenCL must not be used.

Fix

On Android, after loading Vulkan, use it to detect the GPU and decide whether to load OpenCL:

GPU	`{load_opencl, unload_vulkan}`	Result
not Adreno (e.g. Mali)	`{false, false}`	skip OpenCL → Vulkan/CPU
Adreno > 700	`{true, false}`	load OpenCL (Vulkan kept; consumer picks OpenCL)
Adreno 1–700 (incl. ≤ 600)	`{false, true}`	unload Vulkan, CPU only

Only Adreno 700+ has a stable ggml GPU path. This is stricter than fabric-llm, which loads OpenCL on Adreno ≤ 600 — we treat ≤ 600 the same as 601–700 (CPU only), since older Adrenos are no more capable than the 601–700 tier we already exclude.

Implemented with ggml_backend_reg_by_name("vulkan") + ggml_backend_min_adreno_version() (public ggml_backend_reg_dev_* / ggml_backend_dev_description APIs) + ggml_backend_unload(). The threshold decision is factored into a pure ggml_adreno_resolve_backend_policy() so it's unit-testable.

Regression safety

The entire Adreno block is #ifdef __ANDROID__. Off Android the behaviour is byte-identical: load_opencl stays true and OpenCL is loaded in exactly the same position as before (after virtgpu, before hexagon).

Tests

Pure GPU-string parser + policy decision extracted to src/ggml-adreno.h (inline, dependency-free); the parser also lowercases the description (small hardening over fabric-llm's raw substring check).
New tests/test-adreno-version.cpp — 29 cases: 18 parser (Adreno variants; Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno; empty; unparseable) + 11 policy, incl. the > 700 boundary (Adreno exactly 700 → CPU only) and the merged ≤ 600 tier (600/500/1 → CPU only). Wired into ctest as test-adreno-version (always built, no ggml link).

Test plan

ggml lib builds clean (x64-linux)
ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only: clean (Android block compiles)
test-adreno-version: 29/29 pass via ctest
Device farm (after the ggml-speech port bump → whisper rebuild): Samsung S25 (Adreno) → OpenCL, Pixel 9 (Mali) → Vulkan, via transcription-whispercpp's GPU test

Relationship to transcription-whispercpp #2343 (complementary — both required)

This change controls which backends are available; the addon picks among them (same split as the LLM stack: engine availability + BackendSelection). On Adreno 700+ this keeps both Vulkan and OpenCL loaded (like fabric-llm keeps Vulkan), so whisper's default gpu_device=0 would still pick Vulkan → crash unless #2343's adrenoOpenclGpuDeviceIndex → gpu_device guard redirects to OpenCL. So #2343's guard is not redundant; #18 additionally handles the Adreno 1–700 (CPU-only) tier and the non-Adreno OpenCL-skip that the addon guard can't.

Downstream (separate, follow-up)

Once merged: bump the ggml-speech vcpkg port → whisper-cpp (+ tts-cpp / parakeet-cpp) deps → validate via the transcription-whispercpp addon overlay → CI → registry → addon, per the standard merge protocol.

Port the Android GPU backend-selection policy from qvac-fabric-llm.cpp's ggml fork into the speech ggml fork, so the speech stack (whisper / parakeet / tts) selects GPU backends the same way the LLM stack does. Problem: ggml_backend_load_all_from_path() loaded Vulkan AND OpenCL unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan loads first, so whisper.cpp's default gpu_device=0 lands on the Adreno Vulkan driver, which SIGSEGVs in vkCmdBindPipeline during ggml compute (observed on Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g. Mali) Vulkan is the correct, stable path and OpenCL must not be used. Fix (mirrors fabric-llm): on Android, after loading Vulkan, use it to detect the GPU and decide whether to load OpenCL: - not Adreno -> skip OpenCL (Vulkan/CPU), e.g. Mali keeps Vulkan - Adreno > 700 -> load OpenCL (Adreno's stable ggml path) - Adreno 601-700 -> unload Vulkan, CPU only (both backends buggy) - Adreno <= 600 -> load OpenCL (unchanged fall-through) Implemented via ggml_backend_reg_by_name("vulkan") + ggml_backend_min_adreno_version() (using the public ggml_backend_reg_dev_* / ggml_backend_dev_description APIs) + ggml_backend_unload(). Off Android the behaviour is unchanged and byte-identical: `load_opencl` stays true and OpenCL is loaded in exactly the same position as before (right after virtgpu, before hexagon). The whole Adreno block is `#ifdef __ANDROID__`. The pure GPU-string parser is extracted to a header (src/ggml-adreno.h, inline, dependency-free) so it can be unit-tested without a GPU; it also lowercases the description (a small hardening over fabric-llm's raw substring check). New tests/test-adreno-version.cpp covers 18 cases (Adreno variants, Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno, empty, and unparseable); wired into ctest as test-adreno-version (always built, no ggml link). Validation (x64-linux): - ggml lib builds clean (ggml-backend-reg.cpp.o) - ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only: clean (Android block compiles) - test-adreno-version: 18/18 pass via ctest The Adreno runtime selection itself is validated end-to-end on the device farm via transcription-whispercpp's GPU test (Samsung S25 -> OpenCL). Co-authored-by: Cursor <cursoragent@cursor.com>

…review) Code-review follow-up on the Android Adreno backend-selection change: - Factor the OpenCL/Vulkan decision (the 700/600 generation thresholds) out of ggml_backend_load_all_from_path() into ggml_adreno_resolve_backend_policy() in ggml-adreno.h, returning {load_opencl, unload_vulkan}. This makes the policy boundaries unit-testable without a GPU; the loader now just consumes the policy (behaviour unchanged, still mirrors qvac-fabric-llm.cpp). - Add 11 policy cases to test-adreno-version.cpp, pinning the subtle boundaries: Adreno exactly 700 -> CPU only (not > 700), exactly 600 -> OpenCL (not > 600), 701/730/750/830 -> OpenCL, 601..700 -> CPU only, <=600 -> OpenCL, <=0 -> none. - Document the parser's first-digit-run limitation (inherited from fabric-llm): "Adreno X1-85" would parse as 1; that naming is Snapdragon-X (Windows-on-ARM) only, not Android phones (5xx/6xx/7xx/8xx parse correctly). 29/29 test cases pass via ctest; ggml builds clean; ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Per review: only Adreno > 700 has a stable ggml GPU path, so collapse the Adreno <=600 tier into the 601..700 tier — all Adreno 1..700 now resolves to CPU only (unload Vulkan, no OpenCL). This is stricter than qvac-fabric-llm.cpp (which loaded OpenCL on Adreno <=600); older Adreno GPUs are no more capable than the 601..700 tier we already exclude, so there is no reason to expose a GPU backend on them. ggml_adreno_resolve_backend_policy() simplifies to two GPU branches (>700 -> OpenCL, 1..700 -> CPU only) plus the non-Adreno case. Test updated: Adreno 600/500/1 now expect {load_opencl=false, unload_vulkan=true}. 29/29 cases pass; ggml builds clean; -D__ANDROID__ -fsyntax-only clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Zbig9000 and others added 2 commits June 2, 2026 10:58

Zbig9000 requested review from freddy311082, gianni-cor, ishanvohra2, jpgaribotti, ogad-tether and pratiknarola-t June 2, 2026 09:34

ishanvohra2 approved these changes Jun 2, 2026

View reviewed changes

freddy311082 approved these changes Jun 2, 2026

View reviewed changes

gianni-cor approved these changes Jun 2, 2026

View reviewed changes

gianni-cor merged commit 9bca9b3 into tetherto:speech Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993)#18

ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993)#18
gianni-cor merged 3 commits into
tetherto:speechfrom
Zbig9000:QVAC-18993-ggml-adreno-opencl-selection

Zbig9000 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Zbig9000 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Regression safety

Tests

Test plan

Relationship to transcription-whispercpp #2343 (complementary — both required)

Downstream (separate, follow-up)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zbig9000 commented Jun 2, 2026 •

edited

Loading