parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy#23
Merged
Conversation
Brings the parakeet-cpp Android backend story up to parity with
qvac/packages/llm-llamacpp:
* Vulkan and OpenCL ship as separately-loaded MODULE .so files
(qvac-ext-ggml@speech's GGML_BACKEND_DL=ON), discovered at
runtime via `ggml_backend_load_all_from_path()`.
* No static GPU backend init calls anywhere in libparakeet --
`nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init`
returns empty (verified on host).
* Backend selection mirrors llm-llamacpp's BackendSelection.cpp
tier policy: Adreno 700+ -> OpenCL, every other GPU -> Vulkan
(or Metal / CUDA on the matching platform).
Changes:
`src/parakeet_ctc.cpp` (`init_gpu_backend`)
- Rewrote the registry walk to bucket GPU/IGPU devices into
{opencl_adreno_700plus, other_gpu, opencl_other} and pick the
bucket per the tier policy, instead of the previous "first
GPU/IGPU in registry order, skip Adreno 6xx" logic.
- `parse_adreno_version()` handles the standard "Adreno 7xx/8xx"
naming AND the Snapdragon X Elite "Adreno X<n>" naming (mapped
to synthetic 800 so it takes the OpenCL branch). Existing
PARAKEET_ALLOW_ADRENO_6XX env override preserved.
- Added `set_backends_directory(dir)` / `set_opencl_cache_dir(dir)`
public entry points (also declared in `parakeet_ctc.h`) so
embedded host apps can point the ggml-backend registry at a
custom per-module folder before the first Engine construction.
Both honour a "first Engine wins" contract: the gate is a new
`g_backends_loaded` atomic flipped under the shared mutex
*before* the load-all call inside `ensure_backends_loaded`
releases it, so a setter racing a first-Engine construction
either lands its value (and has it picked up by the in-flight
load) or atomically observes the flag and falls into the
warn-once branch. Previously the gate was
`!g_recorded_backends_dir.empty()`, which conflated "registry
loaded" with "registry loaded from a non-empty dir" -- a
second-Engine setter after a first Engine that used the default
search path would silently write to `g_backends_dir` without
re-scanning, with zero diagnostic. Symmetric behaviour applied
to set_opencl_cache_dir.
`include/parakeet/engine.h` (`EngineOptions`)
- `backends_dir`: forwarded to `ggml_backend_load_all_from_path()`
on first Engine construction. Empty -> ggml's compile-time
default search path.
- `opencl_cache_dir`: Android-only, sets $GGML_OPENCL_CACHE_DIR
for ggml-opencl's program-binary cache (the qvac-ext-ggml@speech
program-binary cache patch). Strongly recommended in production
on Android to skip the cold clBuildProgram cost.
`src/parakeet_engine.cpp` (Engine ctor)
- Calls `set_backends_directory` / `set_opencl_cache_dir` before
`load_from_gguf` when the respective EngineOptions fields are
non-empty.
`src/main.cpp` (CLI)
- New `--backends-dir DIR` flag with the same lifetime contract as
`--opencl-cache-dir` (applied before any backend init).
`CMakeLists.txt`
- On Android, default GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON
+ GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON +
GGML_VULKAN_DISABLE_COOPMAT{,2}=ON, matching the qvac llm-llamacpp
Android port (qvac-registry-vcpkg/ports/llama-cpp/portfile.cmake).
Override at the cmake command line as usual.
- Fixed the PARAKEET_GGML_LIB_PREFIX block: it now sets
GGML_LIB_OUTPUT_PREFIX="speech-" as a cache variable BEFORE
add_subdirectory(ggml), and the post-hoc rename loop is removed.
The previous version would double-rename when consumed by the
qvac-ext-ggml@speech default prefix `qvac-speech-`, producing
`libspeech-qvac-speech-ggml-vulkan.so` style filenames that
nothing on the runtime side discovered.
- Dropped the dead GGML_USE_VULKAN / GGML_USE_OPENCL / GGML_USE_METAL
/ GGML_USE_CUDA / GGML_USE_BLAS defines from `parakeet-backend-defs`
and the `parakeet_apply_backend_defs()` helper. No source in
parakeet-cpp uses `#ifdef GGML_USE_*` anymore (everything goes
through the registry); shipping these defines would falsely
advertise a static backend dependency that the GGML_BACKEND_DL=ON
Android/Linux builds explicitly do not have.
Verified by:
* Host build: `cmake -S . -B build -DPARAKEET_BUILD_EXECUTABLES=ON
&& cmake --build build` produces correctly-prefixed
`libspeech-ggml-{base,vulkan,opencl,cpu,blas,metal}.dylib` files
alongside libparakeet.dylib.
* On-device Android (qvac-test-addon-mobile, Samsung S23 FE):
Engine constructs successfully against a q4_0 GGUF, the tier
policy selects the right backend (Vulkan when GPU is requested,
CPU armv8.2_2 variant via the new ggml-backend Android per-arch
fallback), and the addon's integration test suite runs without
`rc=10` from init_cpu_backend.
Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (08df2e70b8b71f8225af6ae35d3576eccea5ae7f), which folds in two PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 4f9b873. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor
pushed a commit
that referenced
this pull request
May 28, 2026
parakeet-cpp: Android dynamic backend loading + Adreno-tier GPU policy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings parakeet-cpp's Android backend story up to parity with
qvac/packages/llm-llamacpp:.sofiles (qvac-ext-ggml@speech'sGGML_BACKEND_DL=ON), discovered at runtime viaggml_backend_load_all_from_path().libparakeet. Verified on host:nm libparakeet.dylib | grep ggml_backend_\(vulkan\|opencl\|metal\|cuda\|blas\)_initreturns empty.
BackendSelection.cpptier policy: Adreno 700+ → OpenCL, every other GPU → Vulkan (or Metal / CUDA
on the matching platform).
What's changed
src/parakeet_ctc.cpp—init_gpu_backend{opencl_adreno_700plus, other_gpu, opencl_other}and pick per the tier policy,replacing the previous "first GPU/IGPU in registry order, skip Adreno 6xx" logic.
parse_adreno_version()handles the standard "Adreno 7xx/8xx" naming AND theSnapdragon X Elite "Adreno X" naming (mapped to synthetic 800 so it takes the
OpenCL branch). Existing
PARAKEET_ALLOW_ADRENO_6XXenv override preserved.set_backends_directory(dir)/set_opencl_cache_dir(dir)(declared in
parakeet_ctc.h) so embedded host apps can point the ggml-backendregistry at a custom per-module folder before the first Engine construction.
Both honour a "first Engine wins" contract gated on a new
g_backends_loadedatomic flipped under the shared mutex before the load-all call inside
ensure_backends_loadedreleases it — racing setters either land their value(and have it picked up by the in-flight load) or atomically observe the flag
and fall into the warn-once branch.
include/parakeet/engine.h—EngineOptionsbackends_dir— forwarded toggml_backend_load_all_from_path()on firstEngine construction. Empty → ggml's compile-time default search path.
opencl_cache_dir— Android-only, sets$GGML_OPENCL_CACHE_DIRforggml-opencl's program-binary cache (qvac-ext-ggml@speech program-binary cache
patch). Strongly recommended in production on Android to skip the cold
clBuildProgramcost.src/parakeet_engine.cppset_backends_directory/set_opencl_cache_dirbeforeload_from_ggufwhen the respectiveEngineOptionsfields are non-empty.src/main.cpp--backends-dir DIRCLI flag with the same lifetime contract as--opencl-cache-dir(applied before any backend init).CMakeLists.txtGGML_BACKEND_DL=ON+GGML_CPU_ALL_VARIANTS=ON+GGML_CPU_REPACK=ON+GGML_VULKAN=ON+GGML_OPENCL=ON+GGML_VULKAN_DISABLE_COOPMAT{,2}=ON, matching the qvac llm-llamacpp Androidport (
qvac-registry-vcpkg/ports/llama-cpp/portfile.cmake).Override at the cmake command line as usual.
PARAKEET_GGML_LIB_PREFIXblock: it now setsGGML_LIB_OUTPUT_PREFIX="speech-"as a cache variable beforeadd_subdirectory(ggml), and the post-hoc rename loop is removed. The previousversion would double-rename when consumed by qvac-ext-ggml@speech's default
prefix
qvac-speech-, producinglibspeech-qvac-speech-ggml-vulkan.sostylefilenames that nothing on the runtime side discovered.
GGML_USE_VULKAN/GGML_USE_OPENCL/GGML_USE_METAL/GGML_USE_CUDA/GGML_USE_BLASdefines fromparakeet-backend-defsand theparakeet_apply_backend_defs()helper. No source in parakeet-cpp uses#ifdef GGML_USE_*anymore (everything goes through the registry); shippingthese defines would falsely advertise a static backend dependency that the
GGML_BACKEND_DL=ONAndroid/Linux builds explicitly do not have.scripts/setup-ggml.shqvac-ext-ggml@speech(which carries the speech-stack patchseries + the
qvac-speech-lib filename prefix this PR's prefix change relieson).
Companion change
This PR depends on a small ggml-backend loader patch landing on
qvac-ext-ggml@speech: an Android
__ANDROID__block inggml_backend_load_bestthat enumerates per-arch CPU variant names(
cpu-android_armv{8.0,8.2,8.6,9.0,9.2}_*) as candidates for the bare-namedlopenfallback. Without it,GGML_CPU_ALL_VARIANTS=ONbuilds on Androidfail to register the CPU backend at runtime (the APK's compressed
.solayoutunder
useLegacyPackaging=falseleaves nothing forfs::directory_iteratorto scan, and the existing fallback only composed the base name
libqvac-speech-ggml-cpu.so— which doesn't exist withCPU_ALL_VARIANTS).Mirrors the equivalent fallback already present downstream on
qvac-fabric-llm.cpp's ggml fork.
tetherto/qvac-ext-ggml#11
Testing
cmake -S . -B build -DPARAKEET_BUILD_EXECUTABLES=ON && cmake --build buildconfigures clean and produces correctly-prefixedlibspeech-ggml-{base,vulkan,opencl,cpu,blas,metal}.dylibfiles alongsidelibparakeet.dylib. No static GPU backend symbols leaked intolibparakeet(verified with
nm).tetherto/qvac-test-addon-mobileagainst theconsumer integration suite, on Samsung S23 FE (Cortex-A78, ARMv8.2 + dotprod +
i8mm):
q4_0GGUF (TDT, EOU,Sortformer).
libqvac-speech-ggml-cpu-android_armv8.2_2.sovia the score function — no
rc=10frominit_cpu_backendanymore.useGPU=trueand the deviceisn't an Adreno 7xx+.
runAccuracyMultilangTest,runMultipleTranscriptionsTest,runColdStartTimingTest,runDuplexStreamingTest,runEouStreamingTest,runMobilePerf*Test, etc.) now actually exercise the engine instead offail-fast on
loadGgufOrSkip.