Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/transcription-parakeet/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- **Android: re-enable GPU (Vulkan/OpenCL) for Parakeet — overlay-only, for device-farm validation (QVAC-20556).** Removes the `ParakeetModel::load` `__ANDROID__` guard that forced `useGPU=false`, widens the addon's Android backend-staging glob to bundle the `libqvac-speech-ggml-{vulkan,opencl}.so` MODULE libs (reversing the `[0.7.2]` CPU-only packaging), and lifts the Android early-pass skips in the `gpu-smoke` integration test so its strict backend assertion (`backendDevice=1`, `backendId` Vulkan/OpenCL) runs on real devices. Builds against `ggml-speech@44fd4817` + `parakeet-cpp@ed749556` (whisper.cpp `master`) via in-package overlay ports; the registry baseline and registry `version>=` pins are unchanged. This is the inverse of the CPU-only workaround in #2525 — its purpose is to get an empirical AWS Device Farm signal (Pixel 9/Mali + S25/Adreno 830) on whether the latest speech stack drives those GPUs. **Not for merge as-is.**
- Bumped the `parakeet-cpp` `version>=` constraint to `2026-06-10` (whisper.cpp `1c75d6e9`), which refreshes the bundled `ggml-speech` to the current speech-branch tip `bec032cd`. The registry baseline is left unchanged. The `parakeet-cpp` C++ tree is unchanged since the previous `128dae42` pin, so this only moves `ggml-speech`; prebuilds and the desktop RTF benchmark now build against the latest speech stack (QVAC-20614).
- Performance reports now surface the desktop GPU hardware name. `test/integration/helpers.js` injects `bare-subprocess` into the shared performance reporter's `configure()` so `_detectGpu()` can shell out to nvidia-smi / vulkaninfo / system_profiler and populate `device.gpu` (e.g. "NVIDIA RTX 4000 SFF Ada") on GPU desktop runners (QVAC-20499). Mobile (Device Farm) reports continue to leave `device.gpu` null — the device name is the proxy there.
- RTF benchmark now reports GGML backends. `test/benchmark/rtf-benchmark.test.js` resolves the requested GPU backend family to the parakeet.cpp cascade (Metal on darwin/ios, Vulkan on linux/win32, Vulkan/OpenCL on android) instead of the stale ONNX names (coreml/nnapi/auto-gpu), and now captures the *actual* backend the engine ran on via `stats.backendId` / `stats.backendDevice` (`labels.activeBackend`, `summary.backendId`). `scripts/perf-report/aggregate-parakeet-rtf.js` GPU-backend coverage map updated to the GGML set (vulkan/metal/opencl/cuda).
Expand Down
31 changes: 13 additions & 18 deletions packages/transcription-parakeet/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ set(BACKENDS_SUBDIR_VALUE "${bare_target_value}/${module_name}")
message(STATUS "Building qvac_lib_infer_parakeet with BACKENDS_SUBDIR='${BACKENDS_SUBDIR_VALUE}'")

# Collect every dynamically-loadable ggml backend surfaced by the
# ggml-speech port. On Android this is intentionally CPU-only while
# Parakeet forces useGPU=false; nothing is staged on Apple/static-only
# platforms. cmake-bare's
# ggml-speech port as an IMPORTED target. On Android the IMPORTED set is
# CPU-only (Vulkan/OpenCL ship as loose MODULE .so -- staged below);
# nothing is staged on Apple/static-only platforms. cmake-bare's
# `add_bare_module(... EXPORTS INSTALL TARGET <tgt>)` bundles each
# IMPORTED ggml::<backend> target into the per-arch prebuilds folder
# next to the `.bare` module. Mirrors qvac/packages/llm-llamacpp/
Expand All @@ -83,8 +83,9 @@ message(STATUS "Building qvac_lib_infer_parakeet with BACKENDS_SUBDIR='${BACKEND
# but ggml-config.cmake.in deliberately leaves them out of
# GGML_AVAILABLE_BACKENDS' IMPORTED set (the loader is expected to
# find them via `ggml_backend_load_all_from_path()` at runtime).
# We do not pick those GPU modules up on Android until their discovery
# path is stable on Mali/Adreno devices.
# We stage those GPU modules on Android too (loose-glob below) so the
# engine can select Vulkan/OpenCL per ggml's Adreno-tier policy when
# useGPU=true; ggml falls back to CPU on devices it can't drive.
set(BACKEND_DL_LIBS "")
if((ANDROID OR UNIX) AND NOT APPLE)
foreach(_backend ${GGML_AVAILABLE_BACKENDS})
Expand All @@ -95,12 +96,11 @@ if((ANDROID OR UNIX) AND NOT APPLE)
endif()

# Loose-file pickup for MODULE backends that aren't surfaced as
# IMPORTED targets by find_package(ggml). On Android we intentionally
# stage CPU backends only while Parakeet forces useGPU=false; loading
# Vulkan/OpenCL during backend discovery is known to abort on some
# Mali/Adreno devices before inference can fall back to CPU. Linux keeps
# the wider glob as a fallback for any future hybrid build there. Glob
# runs at configure time -- the .so files come from ggml-speech's
# IMPORTED targets by find_package(ggml). On Android (as on Linux) we
# stage every ggml-speech backend .so -- CPU per-arch variants plus the
# Vulkan/OpenCL MODULE libs -- so `ggml_backend_load_all_from_path()`
# can discover and select a GPU backend at runtime when useGPU=true.
# Glob runs at configure time -- the .so files come from ggml-speech's
# `file(INSTALL ...)` step in its portfile, which has already completed
# by the time vcpkg hands control back to us. Apple / Windows
# static-only configs return an empty list and skip the install rule.
Expand All @@ -112,13 +112,8 @@ if((ANDROID OR UNIX) AND NOT APPLE)
"ggml backend .so pickup -- the runtime may fail to find "
"Android CPU backend modules.")
else()
if(ANDROID)
file(GLOB BACKEND_DL_LOOSE_SOS
"${VCPKG_INSTALLED_PATH}/lib/libqvac-speech-ggml-cpu-*.so")
else()
file(GLOB BACKEND_DL_LOOSE_SOS
"${VCPKG_INSTALLED_PATH}/lib/libqvac-speech-ggml-*.so")
endif()
file(GLOB BACKEND_DL_LOOSE_SOS
"${VCPKG_INSTALLED_PATH}/lib/libqvac-speech-ggml-*.so")
foreach(_lib IN LISTS BACKEND_DL_LOOSE_SOS)
message(STATUS "qvac_lib_infer_parakeet: will stage ggml backend ${_lib}")
endforeach()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,18 +221,6 @@ void ParakeetModel::cleanupTempFile_() {
void ParakeetModel::load() {
if (is_loaded_) return;

// Force useGPU to false in Android until Vulkan and OpenCL are stabilized
#ifdef __ANDROID__
if (cfg_.useGPU) {
QLOG(
logger::Priority::WARNING,
"Parakeet: useGPU=true is currently ignored on Android "
"(GPU backends disabled at engine boundary pending Vulkan/Mali "
"and OpenCL/Adreno driver fixes); falling back to CPU.");
cfg_.useGPU = false;
}
#endif

QLOG(logger::Priority::INFO,
"Loading Parakeet GGUF (modelType hint: " +
std::to_string(static_cast<int>(cfg_.modelType)) + ")");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@ async function runGpuModelTest (t, modelType, modelPath, audio, expectations) {
}

test('CTC GPU smoke — useGPU=true must engage the GPU backend on GPU-capable platforms', { timeout: 600000, skip: NO_GPU }, async (t) => {
if (platform === 'android') { t.pass('Android: GPU disabled at engine boundary pending Vulkan/Mali + OpenCL/Adreno upstream fixes'); return }
const loggerBinding = setupJsLogger(binding)
try {
const modelPath = await loadGgufOrSkip(t, 'ctc')
Expand All @@ -207,7 +206,6 @@ test('CTC GPU smoke — useGPU=true must engage the GPU backend on GPU-capable p
})

test('TDT GPU smoke — useGPU=true must engage the GPU backend on GPU-capable platforms', { timeout: 600000, skip: NO_GPU }, async (t) => {
if (platform === 'android') { t.pass('Android: GPU disabled at engine boundary pending Vulkan/Mali + OpenCL/Adreno upstream fixes'); return }
const loggerBinding = setupJsLogger(binding)
try {
const modelPath = await loadGgufOrSkip(t, 'tdt')
Expand All @@ -225,7 +223,6 @@ test('TDT GPU smoke — useGPU=true must engage the GPU backend on GPU-capable p
// the zero-token regression triggered by ggml-metal's Q-variant
// mul_mv + bias/residual fusion on the EOU q8_0 joint network.
test('EOU GPU smoke — useGPU=true must engage the GPU backend on GPU-capable platforms', { timeout: 600000, skip: NO_GPU }, async (t) => {
if (platform === 'android') { t.pass('Android: GPU disabled at engine boundary pending Vulkan/Mali + OpenCL/Adreno upstream fixes'); return }
const loggerBinding = setupJsLogger(binding)
try {
const modelPath = await loadGgufOrSkip(t, 'eou')
Expand All @@ -239,7 +236,6 @@ test('EOU GPU smoke — useGPU=true must engage the GPU backend on GPU-capable p
})

test('Sortformer GPU smoke — useGPU=true must engage the GPU backend on GPU-capable platforms', { timeout: 600000, skip: NO_GPU }, async (t) => {
if (platform === 'android') { t.pass('Android: GPU disabled at engine boundary pending Vulkan/Mali + OpenCL/Adreno upstream fixes'); return }
const loggerBinding = setupJsLogger(binding)
try {
const modelPath = await loadGgufOrSkip(t, 'sortformer')
Expand Down
3 changes: 3 additions & 0 deletions packages/transcription-parakeet/vcpkg-configuration.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
"baseline": "74d2dfd03d1c2c0767bac6d892ec43a2a0e29c10",
"repository": "https://github.com/tetherto/qvac-registry-vcpkg.git"
},
"overlay-ports": [
"./vcpkg-overlay-ports"
],
"registries": [
{
"kind": "git",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Detect the Vulkan version shipped with the Android NDK by parsing
# vulkan_core.h from the NDK sysroot. Sets `vulkan_version` in the
# caller's scope (e.g. "1.3.275").
function(detect_ndk_vulkan_version)
string(TOLOWER "${CMAKE_HOST_SYSTEM_NAME}" host_system_name_lower)

file(GLOB host_dirs LIST_DIRECTORIES true "$ENV{ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/${host_system_name_lower}-*")
if(host_dirs)
list(GET host_dirs 0 host_dir)
get_filename_component(host_arch "${host_dir}" NAME)
set(vulkan_core_h "$ENV{ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/${host_arch}/sysroot/usr/include/vulkan/vulkan_core.h")
else()
message(FATAL_ERROR "Could not find NDK host directory for ${host_system_name_lower}")
endif()

if(NOT EXISTS "${vulkan_core_h}")
message(FATAL_ERROR "vulkan_core.h not found at ${vulkan_core_h}")
endif()

file(READ "${vulkan_core_h}" header_content)
string(REGEX MATCH "VK_HEADER_VERSION ([0-9]+)" version_match "${header_content}")
if(version_match)
set(header_version_3 "${CMAKE_MATCH_1}")
else()
message(FATAL_ERROR "Could not extract VK_HEADER_VERSION from ${vulkan_core_h}")
endif()

# Extract major.minor version from VK_HEADER_VERSION_COMPLETE for download URL
string(REGEX MATCH "VK_HEADER_VERSION_COMPLETE VK_MAKE_API_VERSION\\(([0-9]+), ([0-9]+), ([0-9]+)" version_match "${header_content}")
if(version_match)
set(major "${CMAKE_MATCH_2}")
set(minor "${CMAKE_MATCH_3}")
set(vulkan_version "${major}.${minor}.${header_version_3}" PARENT_SCOPE)
else()
message(FATAL_ERROR "Could not extract VK_HEADER_VERSION_COMPLETE from ${vulkan_core_h}")
endif()
endfunction()
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
diff --git a/src/ggml-vulkan/CMakeLists.txt b/src/ggml-vulkan/CMakeLists.txt
index 715a263a..3d92ac5d 100644
--- a/src/ggml-vulkan/CMakeLists.txt
+++ b/src/ggml-vulkan/CMakeLists.txt
@@ -7,6 +7,7 @@ if (POLICY CMP0147)
endif()

find_package(Vulkan COMPONENTS glslc REQUIRED)
+find_package(SPIRV-Headers QUIET)

if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
# Parallel build object files
@@ -87,6 +88,11 @@ if (Vulkan_FOUND)
)

target_link_libraries(ggml-vulkan PRIVATE Vulkan::Vulkan)
+
+ if (TARGET SPIRV-Headers::SPIRV-Headers)
+ target_link_libraries(ggml-vulkan PRIVATE SPIRV-Headers::SPIRV-Headers)
+ endif()
+
target_include_directories(ggml-vulkan PRIVATE ${CMAKE_CURRENT_BINARY_DIR})

# Workaround to the "can't dereference invalidated vector iterator" bug in clang-cl debug build
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# ggml-speech: tetherto/qvac-ext-ggml@speech HEAD 44fd4817 (PR #22,
# "bci-whispercpp OpenCL correctness on Android GPUs (Adreno + Samsung
# Xclipse)"). On top of the previous pin (1189e4ce, PR #19) it adds:
#
# 8bf760f4 ggml-vulkan: Parakeet on Adreno 740 Vulkan + Adreno guards
# e6578d01 ggml-vulkan: grow the descriptor-set pool on demand
# 44fd4817 ggml-opencl: bci-whispercpp OpenCL correctness on Android
# GPUs (Adreno + Samsung Xclipse)
#
# These land the Adreno-740 Vulkan + Android OpenCL-correctness fixes the
# tts-ggml (QVAC-20557) and transcription-parakeet (QVAC-20556) addons
# need to run their graphs on the GPU on Android. The Android backend
# packaging (GGML_BACKEND_DL=ON per-arch CPU variants + MODULE GPU .so)
# is unchanged from the previous pin.

vcpkg_from_github(
OUT_SOURCE_PATH SOURCE_PATH
REPO tetherto/qvac-ext-ggml
REF 44fd4817dd1dc5872053927200e2824b8a0ced86
SHA512 7d83537e5346fc1a1470e6b7ef191c55b02459ac139a841f39d319e7a5e11aea8e3ed0178cbbb0f0b9f2016e0b79c081411d31707368ccb637940fde3496ec14
HEAD_REF speech
)

set(GGML_METAL OFF)
set(GGML_VULKAN OFF)
set(GGML_CUDA OFF)
set(GGML_OPENCL OFF)
set(GGML_METAL_FUSE_MV_BIAS OFF)

if("metal" IN_LIST FEATURES)
set(GGML_METAL ON)
endif()

# Off by default: the chatterbox Q-variant mul_mv + bias/residual fusion
# produces zero tokens on parakeet's EOU q8_0 joint network. Consumers
# whose models stay clear of that pattern can opt in for the speedup.
if("metal-fuse-mv-bias" IN_LIST FEATURES)
set(GGML_METAL_FUSE_MV_BIAS ON)
endif()

if("vulkan" IN_LIST FEATURES)
set(GGML_VULKAN ON)
endif()

set(GGML_CUDA_COMPILER_OPTION "")

if("cuda" IN_LIST FEATURES)
set(GGML_CUDA ON)
find_program(NVCC_EXECUTABLE nvcc
PATHS /usr/local/cuda/bin /usr/local/cuda-12.8/bin
NO_DEFAULT_PATH
)
if(NOT NVCC_EXECUTABLE)
find_program(NVCC_EXECUTABLE nvcc REQUIRED)
endif()
set(GGML_CUDA_COMPILER_OPTION "-DCMAKE_CUDA_COMPILER=${NVCC_EXECUTABLE}")
message(STATUS "CUDA compiler: ${NVCC_EXECUTABLE}")
endif()

if("opencl" IN_LIST FEATURES)
set(GGML_OPENCL ON)
endif()

if(VCPKG_TARGET_IS_ANDROID AND "vulkan" IN_LIST FEATURES)
include(${CMAKE_CURRENT_LIST_DIR}/android-vulkan-version.cmake)
detect_ndk_vulkan_version()
message(STATUS "NDK Vulkan version: ${vulkan_version}")

file(DOWNLOAD
"https://github.com/KhronosGroup/Vulkan-Headers/archive/refs/tags/v${vulkan_version}.tar.gz"
"${SOURCE_PATH}/vulkan-hpp-${vulkan_version}.tar.gz"
TLS_VERIFY ON
)
file(ARCHIVE_EXTRACT
INPUT "${SOURCE_PATH}/vulkan-hpp-${vulkan_version}.tar.gz"
DESTINATION "${SOURCE_PATH}"
PATTERNS "*.hpp"
)
file(COPY "${SOURCE_PATH}/Vulkan-Headers-${vulkan_version}/include/"
DESTINATION "${SOURCE_PATH}/src/")
endif()

set(PLATFORM_OPTIONS)

if(VCPKG_TARGET_IS_IOS)
list(APPEND PLATFORM_OPTIONS -DGGML_BLAS=OFF -DGGML_ACCELERATE=OFF)
endif()

# Hybrid Android backend mode: GPU backends as MODULE .so loaded at runtime
# via dlopen, CPU built as per-arch MODULE .so variants (one per ARMv8.0/
# 8.2/8.6/9.0/9.2 feature tier) also loaded at runtime via dlopen. The
# downstream addon installs the resulting libqvac-speech-ggml-cpu-android_armv*
# .so files alongside the .bare binary; the per-variant scoring in
# ggml-cpu's `ggml_backend_cpu_aarch64_score` then picks the highest tier
# the running device supports at first use. Pairs with the speech-branch
# `ggml-backend: android per-arch CPU variant dlopen fallback` patch
# (commit 9562ed04) so the variant lookup also succeeds when the consumer
# APK keeps native .so files compressed (AGP `useLegacyPackaging=false`).
if(VCPKG_TARGET_IS_ANDROID)
list(APPEND PLATFORM_OPTIONS
-DGGML_BACKEND_DL=ON
-DGGML_CPU_ALL_VARIANTS=ON
-DGGML_CPU_REPACK=ON
-DGGML_VULKAN_DISABLE_COOPMAT=ON
-DGGML_VULKAN_DISABLE_COOPMAT2=ON
)
endif()

# PR #13 (v0.10.2 sync) introduces an unconditional
# `#include <spirv/unified1/spirv.hpp>` in src/ggml-vulkan/ggml-vulkan.cpp,
# but the upstream ggml-vulkan CMakeLists.txt never finds spirv-headers nor
# wires its include dir into the ggml-vulkan target. Apply a small patch
# so it does (and depend on spirv-headers in vcpkg.json's vulkan feature).
# TODO: push the equivalent fix upstream and drop this patch.
if("vulkan" IN_LIST FEATURES)
vcpkg_apply_patches(
SOURCE_PATH "${SOURCE_PATH}"
PATCHES
"${CMAKE_CURRENT_LIST_DIR}/patches/0001-ggml-vulkan-find-spirv-headers.patch"
)
endif()

vcpkg_cmake_configure(
SOURCE_PATH "${SOURCE_PATH}"
OPTIONS
-DBUILD_SHARED_LIBS=OFF
-DGGML_NATIVE=OFF
-DGGML_CCACHE=OFF
-DGGML_OPENMP=OFF
-DGGML_LLAMAFILE=OFF
-DGGML_BUILD_TESTS=OFF
-DGGML_BUILD_EXAMPLES=OFF
-DGGML_METAL=${GGML_METAL}
-DGGML_VULKAN=${GGML_VULKAN}
-DGGML_CUDA=${GGML_CUDA}
-DGGML_OPENCL=${GGML_OPENCL}
-DGGML_METAL_FUSE_MV_BIAS=${GGML_METAL_FUSE_MV_BIAS}
-DGGML_LIB_OUTPUT_PREFIX=qvac-speech-
${GGML_CUDA_COMPILER_OPTION}
${PLATFORM_OPTIONS}
)

vcpkg_cmake_install()

# Pick up the MODULE backend .so files ggml builds into the buildtree's
# bin/ directory (Android dynamic-backend mode). cmake install() doesn't
# move them by default.
if(VCPKG_TARGET_IS_ANDROID)
file(GLOB _backend_sos
"${CURRENT_BUILDTREES_DIR}/${TARGET_TRIPLET}-rel/bin/libqvac-speech-ggml-*.so"
)
if(_backend_sos)
file(INSTALL ${_backend_sos} DESTINATION "${CURRENT_PACKAGES_DIR}/lib")
endif()
endif()

vcpkg_cmake_config_fixup(PACKAGE_NAME ggml CONFIG_PATH lib/cmake/ggml)

if(EXISTS "${CURRENT_PACKAGES_DIR}/share/pkgconfig/ggml.pc")
file(MAKE_DIRECTORY "${CURRENT_PACKAGES_DIR}/lib/pkgconfig")
file(RENAME "${CURRENT_PACKAGES_DIR}/share/pkgconfig/ggml.pc"
"${CURRENT_PACKAGES_DIR}/lib/pkgconfig/ggml.pc")
endif()
if(EXISTS "${CURRENT_PACKAGES_DIR}/debug/share/pkgconfig/ggml.pc")
file(MAKE_DIRECTORY "${CURRENT_PACKAGES_DIR}/debug/lib/pkgconfig")
file(RENAME "${CURRENT_PACKAGES_DIR}/debug/share/pkgconfig/ggml.pc"
"${CURRENT_PACKAGES_DIR}/debug/lib/pkgconfig/ggml.pc")
endif()
file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/share/pkgconfig"
"${CURRENT_PACKAGES_DIR}/debug/share/pkgconfig")
vcpkg_fixup_pkgconfig()

file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include")
file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/share")

set(VCPKG_POLICY_MISMATCHED_NUMBER_OF_BINARIES enabled)

file(INSTALL "${CMAKE_CURRENT_LIST_DIR}/usage" DESTINATION "${CURRENT_PACKAGES_DIR}/share/${PORT}")
vcpkg_install_copyright(FILE_LIST "${SOURCE_PATH}/LICENSE")
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
The package ggml provides CMake integration:

find_package(ggml CONFIG REQUIRED)
target_link_libraries(main PRIVATE ggml::ggml)

Available vcpkg features:
metal - Metal GPU backend (macOS/iOS, auto-enabled on Apple)
vulkan - Vulkan GPU backend
cuda - CUDA GPU backend
opencl - OpenCL GPU backend
Loading
Loading