parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy by GustavoA1604 · Pull Request #23 · tetherto/qvac-ext-lib-whisper.cpp

GustavoA1604 · 2026-05-18T18:59:21Z

Summary

Brings parakeet-cpp's Android backend story up to parity with
qvac/packages/llm-llamacpp:

Vulkan and OpenCL ship as dlopen'd MODULE .so files (qvac-ext-ggml@speech's
GGML_BACKEND_DL=ON), discovered at runtime via ggml_backend_load_all_from_path().
Zero static GPU backend init calls anywhere in libparakeet. Verified on host:
nm libparakeet.dylib | grep ggml_backend_$vulkan\|opencl\|metal\|cuda\|blas$_init
returns empty.
Backend selection mirrors llm-llamacpp's
BackendSelection.cpp
tier policy: Adreno 700+ → OpenCL, every other GPU → Vulkan (or Metal / CUDA
on the matching platform).

What's changed

`src/parakeet_ctc.cpp` — `init_gpu_backend`

Rewrote the registry walk to bucket GPU/IGPU devices into
{opencl_adreno_700plus, other_gpu, opencl_other} and pick per the tier policy,
replacing the previous "first GPU/IGPU in registry order, skip Adreno 6xx" logic.
parse_adreno_version() handles the standard "Adreno 7xx/8xx" naming AND the
Snapdragon X Elite "Adreno X" naming (mapped to synthetic 800 so it takes the
OpenCL branch). Existing PARAKEET_ALLOW_ADRENO_6XX env override preserved.
New public entry points set_backends_directory(dir) / set_opencl_cache_dir(dir)
(declared in parakeet_ctc.h) so embedded host apps can point the ggml-backend
registry at a custom per-module folder before the first Engine construction.
Both honour a "first Engine wins" contract gated on a new g_backends_loaded
atomic flipped under the shared mutex before the load-all call inside
ensure_backends_loaded releases it — racing setters either land their value
(and have it picked up by the in-flight load) or atomically observe the flag
and fall into the warn-once branch.

`include/parakeet/engine.h` — `EngineOptions`

backends_dir — forwarded to ggml_backend_load_all_from_path() on first
Engine construction. Empty → ggml's compile-time default search path.
opencl_cache_dir — Android-only, sets $GGML_OPENCL_CACHE_DIR for
ggml-opencl's program-binary cache (qvac-ext-ggml@speech program-binary cache
patch). Strongly recommended in production on Android to skip the cold
clBuildProgram cost.

`src/parakeet_engine.cpp`

Engine ctor calls set_backends_directory / set_opencl_cache_dir before
load_from_gguf when the respective EngineOptions fields are non-empty.

`src/main.cpp`

New --backends-dir DIR CLI flag with the same lifetime contract as
--opencl-cache-dir (applied before any backend init).

`CMakeLists.txt`

On Android, defaults GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON +
GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON +
GGML_VULKAN_DISABLE_COOPMAT{,2}=ON, matching the qvac llm-llamacpp Android
port (qvac-registry-vcpkg/ports/llama-cpp/portfile.cmake).
Override at the cmake command line as usual.
Fixed the PARAKEET_GGML_LIB_PREFIX block: it now sets
GGML_LIB_OUTPUT_PREFIX="speech-" as a cache variable before
add_subdirectory(ggml), and the post-hoc rename loop is removed. The previous
version would double-rename when consumed by qvac-ext-ggml@speech's default
prefix qvac-speech-, producing libspeech-qvac-speech-ggml-vulkan.so style
filenames that nothing on the runtime side discovered.
Dropped the dead GGML_USE_VULKAN / GGML_USE_OPENCL / GGML_USE_METAL /
GGML_USE_CUDA / GGML_USE_BLAS defines from parakeet-backend-defs and the
parakeet_apply_backend_defs() helper. No source in parakeet-cpp uses
#ifdef GGML_USE_* anymore (everything goes through the registry); shipping
these defines would falsely advertise a static backend dependency that the
GGML_BACKEND_DL=ON Android/Linux builds explicitly do not have.

`scripts/setup-ggml.sh`

Bumped to point at qvac-ext-ggml@speech (which carries the speech-stack patch
series + the qvac-speech- lib filename prefix this PR's prefix change relies
on).

Companion change

This PR depends on a small ggml-backend loader patch landing on
qvac-ext-ggml@speech: an Android __ANDROID__ block in
ggml_backend_load_best that enumerates per-arch CPU variant names
(cpu-android_armv{8.0,8.2,8.6,9.0,9.2}_*) as candidates for the bare-name
dlopen fallback. Without it, GGML_CPU_ALL_VARIANTS=ON builds on Android
fail to register the CPU backend at runtime (the APK's compressed .so layout
under useLegacyPackaging=false leaves nothing for fs::directory_iterator
to scan, and the existing fallback only composed the base name
libqvac-speech-ggml-cpu.so — which doesn't exist with CPU_ALL_VARIANTS).
Mirrors the equivalent fallback already present downstream on
qvac-fabric-llm.cpp's ggml fork.

tetherto/qvac-ext-ggml#11

Testing

Host build: cmake -S . -B build -DPARAKEET_BUILD_EXECUTABLES=ON && cmake --build build configures clean and produces correctly-prefixed
libspeech-ggml-{base,vulkan,opencl,cpu,blas,metal}.dylib files alongside
libparakeet.dylib. No static GPU backend symbols leaked into libparakeet
(verified with nm).
On-device Android via tetherto/qvac-test-addon-mobile against the
consumer integration suite, on Samsung S23 FE (Cortex-A78, ARMv8.2 + dotprod +
i8mm):
- Engine constructs successfully against a q4_0 GGUF (TDT, EOU,
  Sortformer).
- The Android per-arch CPU fallback picks libqvac-speech-ggml-cpu-android_armv8.2_2.so
  via the score function — no rc=10 from init_cpu_backend anymore.
- Tier policy correctly selects Vulkan when useGPU=true and the device
  isn't an Adreno 7xx+.
- Integration tests (runAccuracyMultilangTest, runMultipleTranscriptionsTest,
  runColdStartTimingTest, runDuplexStreamingTest, runEouStreamingTest,
  runMobilePerf*Test, etc.) now actually exercise the engine instead of
  fail-fast on loadGgufOrSkip.

Brings the parakeet-cpp Android backend story up to parity with qvac/packages/llm-llamacpp: * Vulkan and OpenCL ship as separately-loaded MODULE .so files (qvac-ext-ggml@speech's GGML_BACKEND_DL=ON), discovered at runtime via `ggml_backend_load_all_from_path()`. * No static GPU backend init calls anywhere in libparakeet -- `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty (verified on host). * Backend selection mirrors llm-llamacpp's BackendSelection.cpp tier policy: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on the matching platform). Changes: `src/parakeet_ctc.cpp` (`init_gpu_backend`) - Rewrote the registry walk to bucket GPU/IGPU devices into {opencl_adreno_700plus, other_gpu, opencl_other} and pick the bucket per the tier policy, instead of the previous "first GPU/IGPU in registry order, skip Adreno 6xx" logic. - `parse_adreno_version()` handles the standard "Adreno 7xx/8xx" naming AND the Snapdragon X Elite "Adreno X<n>" naming (mapped to synthetic 800 so it takes the OpenCL branch). Existing PARAKEET_ALLOW_ADRENO_6XX env override preserved. - Added `set_backends_directory(dir)` / `set_opencl_cache_dir(dir)` public entry points (also declared in `parakeet_ctc.h`) so embedded host apps can point the ggml-backend registry at a custom per-module folder before the first Engine construction. Both honour a "first Engine wins" contract: the gate is a new `g_backends_loaded` atomic flipped under the shared mutex *before* the load-all call inside `ensure_backends_loaded` releases it, so a setter racing a first-Engine construction either lands its value (and has it picked up by the in-flight load) or atomically observes the flag and falls into the warn-once branch. Previously the gate was `!g_recorded_backends_dir.empty()`, which conflated "registry loaded" with "registry loaded from a non-empty dir" -- a second-Engine setter after a first Engine that used the default search path would silently write to `g_backends_dir` without re-scanning, with zero diagnostic. Symmetric behaviour applied to set_opencl_cache_dir. `include/parakeet/engine.h` (`EngineOptions`) - `backends_dir`: forwarded to `ggml_backend_load_all_from_path()` on first Engine construction. Empty -> ggml's compile-time default search path. - `opencl_cache_dir`: Android-only, sets $GGML_OPENCL_CACHE_DIR for ggml-opencl's program-binary cache (the qvac-ext-ggml@speech program-binary cache patch). Strongly recommended in production on Android to skip the cold clBuildProgram cost. `src/parakeet_engine.cpp` (Engine ctor) - Calls `set_backends_directory` / `set_opencl_cache_dir` before `load_from_gguf` when the respective EngineOptions fields are non-empty. `src/main.cpp` (CLI) - New `--backends-dir DIR` flag with the same lifetime contract as `--opencl-cache-dir` (applied before any backend init). `CMakeLists.txt` - On Android, default GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON + GGML_VULKAN_DISABLE_COOPMAT{,2}=ON, matching the qvac llm-llamacpp Android port (qvac-registry-vcpkg/ports/llama-cpp/portfile.cmake). Override at the cmake command line as usual. - Fixed the PARAKEET_GGML_LIB_PREFIX block: it now sets GGML_LIB_OUTPUT_PREFIX="speech-" as a cache variable BEFORE add_subdirectory(ggml), and the post-hoc rename loop is removed. The previous version would double-rename when consumed by the qvac-ext-ggml@speech default prefix `qvac-speech-`, producing `libspeech-qvac-speech-ggml-vulkan.so` style filenames that nothing on the runtime side discovered. - Dropped the dead GGML_USE_VULKAN / GGML_USE_OPENCL / GGML_USE_METAL / GGML_USE_CUDA / GGML_USE_BLAS defines from `parakeet-backend-defs` and the `parakeet_apply_backend_defs()` helper. No source in parakeet-cpp uses `#ifdef GGML_USE_*` anymore (everything goes through the registry); shipping these defines would falsely advertise a static backend dependency that the GGML_BACKEND_DL=ON Android/Linux builds explicitly do not have. Verified by: * Host build: `cmake -S . -B build -DPARAKEET_BUILD_EXECUTABLES=ON && cmake --build build` produces correctly-prefixed `libspeech-ggml-{base,vulkan,opencl,cpu,blas,metal}.dylib` files alongside libparakeet.dylib. * On-device Android (qvac-test-addon-mobile, Samsung S23 FE): Engine constructs successfully against a q4_0 GGUF, the tier policy selects the right backend (Vulkan when GPU is requested, CPU armv8.2_2 variant via the new ggml-backend Android per-arch fallback), and the addon's integration test suite runs without `rc=10` from init_cpu_backend. Co-authored-by: Cursor <cursoragent@cursor.com>

Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (08df2e70b8b71f8225af6ae35d3576eccea5ae7f), which folds in two PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 4f9b873. Co-authored-by: Cursor <cursoragent@cursor.com>

Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>

parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy

GustavoA1604 and others added 3 commits May 18, 2026 13:48

Update setup-ggml to point to qvac-ext-ggml

aa9fe74

Merge branch 'master' into android-gpu-dynamic-loading

d130682

GustavoA1604 requested review from a team as code owners May 18, 2026 18:59

GustavoA1604 changed the title ~~Android gpu dynamic loading~~ parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy May 18, 2026

GustavoA1604 merged commit 0f2b178 into master May 18, 2026
66 of 73 checks passed

gianni-cor pushed a commit that referenced this pull request May 28, 2026

Merge pull request #23 from tetherto/android-gpu-dynamic-loading

4872a24

parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy

gianni-cor deleted the android-gpu-dynamic-loading branch May 28, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy#23

parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy#23
GustavoA1604 merged 3 commits into
masterfrom
android-gpu-dynamic-loading

GustavoA1604 commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GustavoA1604 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's changed

src/parakeet_ctc.cpp — init_gpu_backend

include/parakeet/engine.h — EngineOptions

src/parakeet_engine.cpp

src/main.cpp

CMakeLists.txt

scripts/setup-ggml.sh

Companion change

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GustavoA1604 commented May 18, 2026 •

edited

Loading

`src/parakeet_ctc.cpp` — `init_gpu_backend`

`include/parakeet/engine.h` — `EngineOptions`

`src/parakeet_engine.cpp`

`src/main.cpp`

`CMakeLists.txt`

`scripts/setup-ggml.sh`