Skip to content

ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993)#18

Merged
gianni-cor merged 3 commits into
tetherto:speechfrom
Zbig9000:QVAC-18993-ggml-adreno-opencl-selection
Jun 2, 2026
Merged

ggml-backend-reg: Adreno-aware OpenCL backend selection on Android (QVAC-18993)#18
gianni-cor merged 3 commits into
tetherto:speechfrom
Zbig9000:QVAC-18993-ggml-adreno-opencl-selection

Conversation

@Zbig9000

@Zbig9000 Zbig9000 commented Jun 2, 2026

Copy link
Copy Markdown

Summary

Ports the Android GPU backend-selection policy from qvac-fabric-llm.cpp's ggml fork (the LLM stack) into the speech ggml fork, so whisper / parakeet / tts select GPU backends the same way the LLM stack does.

Problem

ggml_backend_load_all_from_path() loaded Vulkan and OpenCL unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan loads first, so whisper.cpp's default gpu_device=0 lands on the Adreno Vulkan driver, which SIGSEGVs in vkCmdBindPipeline during ggml compute (observed on Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g. Mali, proven working on Pixel 9), Vulkan is the correct/stable path and OpenCL must not be used.

Fix

On Android, after loading Vulkan, use it to detect the GPU and decide whether to load OpenCL:

GPU {load_opencl, unload_vulkan} Result
not Adreno (e.g. Mali) {false, false} skip OpenCL → Vulkan/CPU
Adreno > 700 {true, false} load OpenCL (Vulkan kept; consumer picks OpenCL)
Adreno 1–700 (incl. ≤ 600) {false, true} unload Vulkan, CPU only

Only Adreno 700+ has a stable ggml GPU path. This is stricter than fabric-llm, which loads OpenCL on Adreno ≤ 600 — we treat ≤ 600 the same as 601–700 (CPU only), since older Adrenos are no more capable than the 601–700 tier we already exclude.

Implemented with ggml_backend_reg_by_name("vulkan") + ggml_backend_min_adreno_version() (public ggml_backend_reg_dev_* / ggml_backend_dev_description APIs) + ggml_backend_unload(). The threshold decision is factored into a pure ggml_adreno_resolve_backend_policy() so it's unit-testable.

Regression safety

The entire Adreno block is #ifdef __ANDROID__. Off Android the behaviour is byte-identical: load_opencl stays true and OpenCL is loaded in exactly the same position as before (after virtgpu, before hexagon).

Tests

  • Pure GPU-string parser + policy decision extracted to src/ggml-adreno.h (inline, dependency-free); the parser also lowercases the description (small hardening over fabric-llm's raw substring check).
  • New tests/test-adreno-version.cpp29 cases: 18 parser (Adreno variants; Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno; empty; unparseable) + 11 policy, incl. the > 700 boundary (Adreno exactly 700 → CPU only) and the merged ≤ 600 tier (600/500/1 → CPU only). Wired into ctest as test-adreno-version (always built, no ggml link).

Test plan

  • ggml lib builds clean (x64-linux)
  • ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only: clean (Android block compiles)
  • test-adreno-version: 29/29 pass via ctest
  • Device farm (after the ggml-speech port bump → whisper rebuild): Samsung S25 (Adreno) → OpenCL, Pixel 9 (Mali) → Vulkan, via transcription-whispercpp's GPU test

Relationship to transcription-whispercpp #2343 (complementary — both required)

This change controls which backends are available; the addon picks among them (same split as the LLM stack: engine availability + BackendSelection). On Adreno 700+ this keeps both Vulkan and OpenCL loaded (like fabric-llm keeps Vulkan), so whisper's default gpu_device=0 would still pick Vulkan → crash unless #2343's adrenoOpenclGpuDeviceIndexgpu_device guard redirects to OpenCL. So #2343's guard is not redundant; #18 additionally handles the Adreno 1–700 (CPU-only) tier and the non-Adreno OpenCL-skip that the addon guard can't.

Downstream (separate, follow-up)

Once merged: bump the ggml-speech vcpkg port → whisper-cpp (+ tts-cpp / parakeet-cpp) deps → validate via the transcription-whispercpp addon overlay → CI → registry → addon, per the standard merge protocol.

Zbig9000 and others added 2 commits June 2, 2026 10:58
Port the Android GPU backend-selection policy from qvac-fabric-llm.cpp's ggml
fork into the speech ggml fork, so the speech stack (whisper / parakeet / tts)
selects GPU backends the same way the LLM stack does.

Problem: ggml_backend_load_all_from_path() loaded Vulkan AND OpenCL
unconditionally. On Adreno, ggml registers both for the same GPU and Vulkan
loads first, so whisper.cpp's default gpu_device=0 lands on the Adreno Vulkan
driver, which SIGSEGVs in vkCmdBindPipeline during ggml compute (observed on
Samsung S25 Ultra, device-farm run 26646220900). On non-Adreno GPUs (e.g.
Mali) Vulkan is the correct, stable path and OpenCL must not be used.

Fix (mirrors fabric-llm): on Android, after loading Vulkan, use it to detect
the GPU and decide whether to load OpenCL:
  - not Adreno            -> skip OpenCL (Vulkan/CPU), e.g. Mali keeps Vulkan
  - Adreno > 700          -> load OpenCL (Adreno's stable ggml path)
  - Adreno 601-700        -> unload Vulkan, CPU only (both backends buggy)
  - Adreno <= 600         -> load OpenCL (unchanged fall-through)
Implemented via ggml_backend_reg_by_name("vulkan") +
ggml_backend_min_adreno_version() (using the public ggml_backend_reg_dev_*
/ ggml_backend_dev_description APIs) + ggml_backend_unload().

Off Android the behaviour is unchanged and byte-identical: `load_opencl`
stays true and OpenCL is loaded in exactly the same position as before (right
after virtgpu, before hexagon). The whole Adreno block is `#ifdef __ANDROID__`.

The pure GPU-string parser is extracted to a header (src/ggml-adreno.h,
inline, dependency-free) so it can be unit-tested without a GPU; it also
lowercases the description (a small hardening over fabric-llm's raw substring
check). New tests/test-adreno-version.cpp covers 18 cases (Adreno variants,
Mali/NVIDIA/AMD/Apple/Intel/llvmpipe non-Adreno, empty, and unparseable);
wired into ctest as test-adreno-version (always built, no ggml link).

Validation (x64-linux):
  - ggml lib builds clean (ggml-backend-reg.cpp.o)
  - ggml-backend-reg.cpp -D__ANDROID__ -fsyntax-only: clean (Android block compiles)
  - test-adreno-version: 18/18 pass via ctest
The Adreno runtime selection itself is validated end-to-end on the device
farm via transcription-whispercpp's GPU test (Samsung S25 -> OpenCL).

Co-authored-by: Cursor <cursoragent@cursor.com>
…review)

Code-review follow-up on the Android Adreno backend-selection change:

- Factor the OpenCL/Vulkan decision (the 700/600 generation thresholds) out of
  ggml_backend_load_all_from_path() into ggml_adreno_resolve_backend_policy()
  in ggml-adreno.h, returning {load_opencl, unload_vulkan}. This makes the
  policy boundaries unit-testable without a GPU; the loader now just consumes
  the policy (behaviour unchanged, still mirrors qvac-fabric-llm.cpp).
- Add 11 policy cases to test-adreno-version.cpp, pinning the subtle boundaries:
  Adreno exactly 700 -> CPU only (not > 700), exactly 600 -> OpenCL (not > 600),
  701/730/750/830 -> OpenCL, 601..700 -> CPU only, <=600 -> OpenCL, <=0 -> none.
- Document the parser's first-digit-run limitation (inherited from fabric-llm):
  "Adreno X1-85" would parse as 1; that naming is Snapdragon-X (Windows-on-ARM)
  only, not Android phones (5xx/6xx/7xx/8xx parse correctly).

29/29 test cases pass via ctest; ggml builds clean; ggml-backend-reg.cpp
-D__ANDROID__ -fsyntax-only clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Per review: only Adreno > 700 has a stable ggml GPU path, so collapse the
Adreno <=600 tier into the 601..700 tier — all Adreno 1..700 now resolves to
CPU only (unload Vulkan, no OpenCL). This is stricter than qvac-fabric-llm.cpp
(which loaded OpenCL on Adreno <=600); older Adreno GPUs are no more capable
than the 601..700 tier we already exclude, so there is no reason to expose a
GPU backend on them.

ggml_adreno_resolve_backend_policy() simplifies to two GPU branches (>700 ->
OpenCL, 1..700 -> CPU only) plus the non-Adreno case. Test updated: Adreno
600/500/1 now expect {load_opencl=false, unload_vulkan=true}.

29/29 cases pass; ggml builds clean; -D__ANDROID__ -fsyntax-only clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants