QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE] by pratiknarola-t · Pull Request #2577 · tetherto/qvac

pratiknarola-t · 2026-06-12T18:44:22Z

⚠️ DO-NOT-MERGE — measurement vehicle

Overlay-only PR (ticket QVAC-20556) to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9 / Mali + S25 Ultra / Adreno 830). This is the inverse of the CPU-only workaround in #2525 — please don't merge over it.

Add the verified label to fire the device-farm leg.

What this changes

packages/transcription-parakeet/:

ParakeetModel::load — remove the #ifdef __ANDROID__ guard that forced useGPU=false (kept the n_gpu_layers logic + the GPU-init→CPU fallback warning).
CMakeLists.txt — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the vulkan/opencl MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments.
gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device.
In-package vcpkg overlay ports — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via overlay-ports in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged — the registry PR is deferred until the device-farm result is understood.
vcpkg.json — bump parakeet-cpp version>= to the overlay version-date.

Local device finding (Adreno 740 / iQOO 11, TDT q4_0)

Run directly against this branch's prebuild on a physically-attached Adreno 740:

Path	Backend	Result
CPU (`useGPU=false`)	CPU (id 0)	✅ correct transcript
GPU, engine default	OpenCL (id 4, auto-selected on Adreno>700)	❌ SIGABRT — `ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)` → `GGML_ASSERT`
GPU, OpenCL withheld	Vulkan (id 3)	⚠️ runs, but transcript degraded vs CPU (dropped words) and ~2× slower

So on the Adreno the engine picks OpenCL, whose backend lacks ARGMAX and aborts in graph-compute instead of falling back to CPU. The Vulkan path (the one ggml-speech@8bf760f4 reported byte-identical on this exact device) is not what the engine selects, and even when forced it no longer reproduces the byte-identical result on the current 44fd4817/ed749556 stack.

Expectation for the device-farm run: the Adreno (S25) leg likely hits the same OpenCL ARGMAX abort (which can SIGABRT the Bare worklet and take down subsequent tests, cf. #2525); the Mali (Pixel 9) leg exercises the Vulkan path.

Note (pre-existing, out of scope)

While bringing this up on a local device, found that the addon's BACKENDS_SUBDIR compile-definition is PRIVATE on the bare-module target but ParakeetModel.cpp compiles into parakeet_model_core, so the subdir isn't appended to a host-provided default backendsDir. The device-farm/APK passes an explicit flat nativeLibraryDir, so CI is unaffected — but a host relying on the __dirname/prebuilds default would not find the backend .so. Filed mentally as a follow-up; not touched here.

Refs

ggml-speech 44fd4817 (qvac-ext-ggml@speech HEAD)
parakeet-cpp ed749556 (qvac-ext-lib-whisper.cpp@master HEAD)
Related: fix[notask]: ship Parakeet CPU-only on Android to stop Adreno Vulkan SIGABRT #2525 (parakeet Android CPU-only), QVAC-19255 feat[api]: reintroduce Supertonic GPU support (desktop/iOS; Android CPU-only) #2506 (Supertonic desktop/iOS GPU, Android CPU-only)

…lidation) DO-NOT-MERGE — overlay-only PR to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9/Mali + S25/Adreno 830). This is the inverse of the CPU-only workaround in #2525. Changes (packages/transcription-parakeet): - ParakeetModel::load — remove the __ANDROID__ guard that forced useGPU=false. - CMakeLists — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the Vulkan/OpenCL MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments. - gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device. - vcpkg overlay ports (in-package) — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via the overlay-ports entry in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged; the registry PR is deferred. - vcpkg.json — bump parakeet-cpp version>= to the overlay version-date. Local device finding (Adreno 740 / iQOO 11), TDT q4_0, recorded for reviewers: - CPU: correct transcript, backendDevice=0. - GPU OpenCL (engine auto-selects this on Adreno>700): aborts in graph-compute — "op not supported joint.token_argmax (ARGMAX)" -> GGML_ASSERT (SIGABRT). - GPU Vulkan (forced by withholding the OpenCL module): runs (backendId=3) but output is degraded vs CPU (dropped words) and ~2x slower; NOT the byte-identical result ggml-speech 8bf760f4 reported. Expect the Device Farm Adreno (S25) leg to hit the OpenCL ARGMAX abort and the Mali leg to exercise the Vulkan path. Do not merge — this is a measurement vehicle.

pratiknarola-t · 2026-06-12T18:47:59Z

Local Adreno 740 (iQOO 11) matrix — refined

Ran each model type directly against this branch's prebuild on a physically-attached Adreno 740. On Adreno the engine auto-selects OpenCL (policy: Adreno>700 → OpenCL). Results:

Model	CPU	OpenCL (GPU, auto)	Vulkan (GPU, OpenCL withheld)
TDT (q4_0)	✅ correct	❌ SIGABRT — `ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)` → `GGML_ASSERT`	⚠️ runs (`backendId=3`) but transcript degraded vs CPU + ~2× slower
EOU (q4_0)	✅	✅ correct (95 tokens)	—
Sortformer (q8_0)	—	✅ correct (speaker labels)	—
CTC	n/a on mobile	n/a	—

Takeaway: the GPU blocker is narrow — TDT's joint.token_argmax (ARGMAX) is not implemented in the ggml OpenCL backend, and supports_op/graph-compute aborts instead of falling back to CPU. EOU and Sortformer run fine on OpenCL. The Vulkan path supports the op (no crash) but is degraded/slower on this device, and is not what the engine selects on Adreno anyway.

Implications for the Device Farm run:

Adreno (S25/830) leg: EOU + Sortformer GPU should pass; the TDT GPU smoke will likely SIGABRT (and a Bare-worklet abort can cascade to later tests).
Mali (Pixel 9) leg: exercises the Vulkan path (no OpenCL on non-Adreno) — separate unknown.

Fix directions (follow-up, not in this PR): implement ARGMAX in ggml-opencl, OR make ggml-opencl supports_op return false for ARGMAX so it routes to CPU, OR have parakeet-cpp keep the TDT joint argmax on CPU.

Separately, a pre-existing latent bug surfaced during bring-up: the addon's BACKENDS_SUBDIR compile-def is PRIVATE on the bare-module target while ParakeetModel.cpp compiles into parakeet_model_core, so the subdir isn't appended to a host-provided default backendsDir (__dirname/prebuilds). The device-farm APK passes an explicit flat nativeLibraryDir, so CI is unaffected — but a host relying on the default would not find the backend .so.

github-actions · 2026-06-12T18:48:26Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

github-actions · 2026-06-12T19:29:44Z

Mobile integration tests — @qvac/transcription-parakeet (Android)

Result: failed

metric	value
Devices passed	0
Devices failed	2
Test cases total	6
Test cases passed	4
Test cases failed	2
Test cases skipped	0

View workflow run

github-actions · 2026-06-12T19:36:39Z

Mobile integration tests — @qvac/transcription-parakeet (iOS)

Result: passed

metric	value
Devices passed	2
Devices failed	0
Test cases total	6
Test cases passed	6
Test cases failed	0
Test cases skipped	0

View workflow run

pratiknarola-t requested review from a team as code owners June 12, 2026 18:44

pratiknarola-t added the verified Authorize secrets / label-gate in PR workflows label Jun 12, 2026

pratiknarola-t temporarily deployed to release June 12, 2026 18:45 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 18:45 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 12, 2026 18:45 — with GitHub Actions Inactive

pratiknarola-t temporarily deployed to release June 12, 2026 18:58 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 18:58 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 12, 2026 18:58 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 19:41 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577

QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577
pratiknarola-t wants to merge 1 commit into
mainfrom
qvac-20556-parakeet-android-gpu

pratiknarola-t commented Jun 12, 2026

Uh oh!

pratiknarola-t commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pratiknarola-t commented Jun 12, 2026

⚠️ DO-NOT-MERGE — measurement vehicle

What this changes

Local device finding (Adreno 740 / iQOO 11, TDT q4_0)

Note (pre-existing, out of scope)

Refs

Uh oh!

pratiknarola-t commented Jun 12, 2026

Local Adreno 740 (iQOO 11) matrix — refined

Uh oh!

github-actions Bot commented Jun 12, 2026

Tier-based Approval Status

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mobile integration tests — @qvac/transcription-parakeet (Android)

Uh oh!

github-actions Bot commented Jun 12, 2026

Mobile integration tests — @qvac/transcription-parakeet (iOS)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 12, 2026 •

edited

Loading