QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577
QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577pratiknarola-t wants to merge 1 commit into
Conversation
…lidation) DO-NOT-MERGE — overlay-only PR to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9/Mali + S25/Adreno 830). This is the inverse of the CPU-only workaround in #2525. Changes (packages/transcription-parakeet): - ParakeetModel::load — remove the __ANDROID__ guard that forced useGPU=false. - CMakeLists — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the Vulkan/OpenCL MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments. - gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device. - vcpkg overlay ports (in-package) — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via the overlay-ports entry in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged; the registry PR is deferred. - vcpkg.json — bump parakeet-cpp version>= to the overlay version-date. Local device finding (Adreno 740 / iQOO 11), TDT q4_0, recorded for reviewers: - CPU: correct transcript, backendDevice=0. - GPU OpenCL (engine auto-selects this on Adreno>700): aborts in graph-compute — "op not supported joint.token_argmax (ARGMAX)" -> GGML_ASSERT (SIGABRT). - GPU Vulkan (forced by withholding the OpenCL module): runs (backendId=3) but output is degraded vs CPU (dropped words) and ~2x slower; NOT the byte-identical result ggml-speech 8bf760f4 reported. Expect the Device Farm Adreno (S25) leg to hit the OpenCL ARGMAX abort and the Mali leg to exercise the Vulkan path. Do not merge — this is a measurement vehicle.
Local Adreno 740 (iQOO 11) matrix — refinedRan each model type directly against this branch's prebuild on a physically-attached Adreno 740. On Adreno the engine auto-selects OpenCL (policy: Adreno>700 → OpenCL). Results:
Takeaway: the GPU blocker is narrow — TDT's Implications for the Device Farm run:
Fix directions (follow-up, not in this PR): implement Separately, a pre-existing latent bug surfaced during bring-up: the addon's |
Tier-based Approval Status |
Mobile integration tests — @qvac/transcription-parakeet (Android)Result: failed
|
Mobile integration tests — @qvac/transcription-parakeet (iOS)Result: passed
|
Overlay-only PR (ticket QVAC-20556) to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9 / Mali + S25 Ultra / Adreno 830). This is the inverse of the CPU-only workaround in #2525 — please don't merge over it.
Add the
verifiedlabel to fire the device-farm leg.What this changes
packages/transcription-parakeet/:ParakeetModel::load— remove the#ifdef __ANDROID__guard that forceduseGPU=false(kept then_gpu_layerslogic + the GPU-init→CPU fallback warning).CMakeLists.txt— widen the Android backend-staging glob fromlibqvac-speech-ggml-cpu-*.sotolibqvac-speech-ggml-*.soso thevulkan/openclMODULE libs ship in the prebuild (reverses the[0.7.2]CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments.gpu-smoke.test.js— drop the four Android early-pass skips so the strictassertGpuBackend(backendDevice=1,backendIdVulkan/OpenCL) runs on device.ggml-speech@44fd4817(speech HEAD) +parakeet-cpp@ed749556(whisper.cppmaster), wired viaoverlay-portsinvcpkg-configuration.json. Registry baseline and registryversion>=pins are unchanged — the registry PR is deferred until the device-farm result is understood.vcpkg.json— bumpparakeet-cppversion>=to the overlay version-date.Local device finding (Adreno 740 / iQOO 11, TDT q4_0)
Run directly against this branch's prebuild on a physically-attached Adreno 740:
useGPU=false)ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)→GGML_ASSERTSo on the Adreno the engine picks OpenCL, whose backend lacks
ARGMAXand aborts in graph-compute instead of falling back to CPU. The Vulkan path (the oneggml-speech@8bf760f4reported byte-identical on this exact device) is not what the engine selects, and even when forced it no longer reproduces the byte-identical result on the current44fd4817/ed749556stack.Expectation for the device-farm run: the Adreno (S25) leg likely hits the same OpenCL ARGMAX abort (which can SIGABRT the Bare worklet and take down subsequent tests, cf. #2525); the Mali (Pixel 9) leg exercises the Vulkan path.
Note (pre-existing, out of scope)
While bringing this up on a local device, found that the addon's
BACKENDS_SUBDIRcompile-definition isPRIVATEon the bare-module target butParakeetModel.cppcompiles intoparakeet_model_core, so the subdir isn't appended to a host-provided defaultbackendsDir. The device-farm/APK passes an explicit flatnativeLibraryDir, so CI is unaffected — but a host relying on the__dirname/prebuildsdefault would not find the backend.so. Filed mentally as a follow-up; not touched here.Refs
ggml-speech44fd4817(qvac-ext-ggml@speech HEAD)parakeet-cpped749556(qvac-ext-lib-whisper.cpp@master HEAD)