Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml by GustavoA1604 · Pull Request #11 · tetherto/qvac-ext-lib-whisper.cpp

GustavoA1604 · 2026-05-01T18:53:15Z

Note: out of the 125k lines addition, most of it (91k) are due to adding parakeet-cpp/examples/miniaudio.h, which mirrors the whisper.cpp/examples/miniaudio.h file. I didn't want to add a symbolic link to keep projects separated.

Summary

Vendors parakeet-cpp under parakeet-cpp/ — a pure C++/ggml inference port of the NVIDIA Parakeet family (FastConformer ASR + Sortformer diarization). One parakeet::Engine loads CTC, TDT, EOU, or Sortformer GGUFs and dispatches by metadata; no Python / PyTorch / ONNX Runtime at runtime.

The subtree is fully self-contained:

Builds independently via its own parakeet-cpp/CMakeLists.txt (PARAKEET_BUILD_LIBRARY / PARAKEET_BUILD_EXECUTABLES / PARAKEET_BUILD_TESTS / PARAKEET_BUILD_EXAMPLES); the top-level whisper.cpp build is not touched, so existing whisper consumers keep their current shape.
Bundles ggml via add_subdirectory(ggml) after scripts/setup-ggml.sh clones it at the pinned commit (58c38058) and applies the three local patches under parakeet-cpp/patches/ (see below).
Ships its own CMake package config — install(EXPORT parakeet-cpp-targets NAMESPACE parakeet::) + parakeet-cppConfig.cmake.in — so downstream code resolves parakeet::parakeet and ggml::ggml from a find_package(parakeet-cpp CONFIG REQUIRED).

What ships

Area	Highlights
Engines	CTC (`parakeet-ctc-{0.6b,1.1b}`), TDT (`parakeet-tdt-{0.6b-v3,1.1b}`, multilingual), EOU (`parakeet_realtime_eou_120m-v1`, `<EOU>` turn-detection token), Sortformer (`diar_sortformer_4spk-v1` / `diar_streaming_sortformer_4spk-v2`, up to 4 speakers)
Streaming	Mode 2 (full encode → segmented stream), Mode 3 (push duplex with chunked-limited attention + cache-aware history). Speaker-attributed transcription via `--diarization-model`.
Cross-engine events	Single `StreamEvent` umbrella (`EndOfTurn` from EOU's `<EOU>`, `VadStateChanged` from Sortformer probs and an opt-in CPU energy-VAD on CTC/TDT).
CLI / examples	`parakeet` CLI with `--bench` / `--profile` / JSONL emit / OpenCL knobs; `live-mic`, `live-mic-attributed` examples on miniaudio.
Test suite	Per-stage NeMo parity (mel / encoder / TDT / Sortformer), encoder-capture parity, decoder determinism, mel FFT parity, optional `test-vk-vs-cpu`. CTest labels: `unit`, `fixture`, `perf`, `gpu`.
Backends	CPU (BLAS optional) + CUDA / Metal / Vulkan / OpenCL via `ggml_backend_load_all()` + registry walk; Adreno-6xx OpenCL fallback to CPU is preserved.
Conversion	`scripts/convert-nemo-to-gguf.py` (`.nemo` → `.gguf` with f32 / f16 / q8_0 / q5_0 / q4_0); `scripts/dump-{ctc,tdt,eou,sortformer}-reference.py` for the parity harnesses.
Docs	`parakeet-cpp/README.md` (quickstart, GPU build matrix, benchmarks vs onnxruntime, CMake knobs), `parakeet-cpp/PROGRESS.md` (full development history).

ggml patches

Three patches live under parakeet-cpp/patches/ and are applied in lex order by scripts/setup-ggml.sh. Each is a strict no-op when its trigger is not active, so they don't disturb stock ggml consumers.

ggml-backend-reg-filename-prefix.patch — adds a compile-time GGML_BACKEND_DL_PROJECT_PREFIX macro to backend_filename_prefix() so a host project that renames its bundled libggml-* files (parakeet does this via PARAKEET_GGML_LIB_PREFIX=ON, default) does not break runtime backend discovery under GGML_BACKEND_DL=ON. Macro undefined ⇒ behaviour byte-equal to upstream.
ggml-opencl-allow-non-adreno.patch — opt-in (GGML_OPENCL_ALLOW_UNKNOWN_GPU=1) relax of the Adreno/Intel-only device whitelist so dev hosts on NVIDIA / AMD / Intel iGPU can build and parity-test the OpenCL backend without an Adreno device. Adreno production path unchanged.
ggml-opencl-program-binary-cache.patch — persistent on-disk cache for compiled OpenCL kernel binaries via clCreateProgramWithBinary, keyed on (src, opts, driver, dev) FNV-1a-64 hashes; honours $GGML_OPENCL_CACHE_DIR (with $XDG_CACHE_HOME/ggml/opencl → $HOME/.cache/ggml/opencl fallbacks). Removes the multi-second cold-start clBuildProgram wave on Adreno / Mesa / Mali.

parakeet-cpp/patches/README.md documents each patch and the drop conditions.

Layout

parakeet-cpp/
├── CMakeLists.txt               # standalone build (library / CLI / tests / examples)
├── cmake/
│   └── parakeet-cppConfig.cmake.in
├── include/parakeet/            # public headers (Engine, StreamSession, …)
├── src/                         # engines, decoders, mel, CLI
├── examples/                    # live-mic, live-mic-attributed (+ vendored miniaudio)
├── test/                        # CTest sources
├── scripts/                     # setup-ggml.sh, convert-nemo-to-gguf.py, NeMo dumps,
│                                # download-all-models.sh, optional maintainer tools
├── patches/                     # ggml patches applied by setup-ggml.sh
├── README.md                    # quickstart + benchmarks + CMake knob reference
├── PROGRESS.md                  # full development history (Phases 1–13)
├── LICENSE                      # Apache-2.0
└── NOTICE

The parakeet-cpp/.gitignore keeps the cloned ggml/, models/, artifacts/*.npy, and any build*/ trees out of git.

Why land it under `qvac-ext-lib-whisper.cpp/`

Centralises the in-house speech stack alongside whisper.cpp; both libraries share the same ggml backend ecosystem and the same packaging / vcpkg pipeline, so reviewers, CI, and consumers (vcpkg port, addon hosts) only need to track one upstream repo. The subtree is opt-in at build time and does not change the existing whisper build, headers, or vcpkg surface.

Build

From parakeet-cpp/:

./scripts/setup-ggml.sh                                         # clone+patch ggml
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)

GPU backends are configure-time:

# Apple Silicon
cmake -S . -B build-metal -DCMAKE_BUILD_TYPE=Release -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON
# Vulkan / desktop
cmake -S . -B build-vk    -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON
# OpenCL (Adreno / Android; or NVIDIA/AMD/Intel for parity testing)
cmake -S . -B build-cl    -DCMAKE_BUILD_TYPE=Release -DGGML_OPENCL=ON

ctest --test-dir build --output-on-failure after the optional scripts/dump-*-reference.py step. Missing fixtures auto-disable individual tests so a fresh checkout still gives a green run.

Consumption

find_package(parakeet-cpp CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE parakeet::parakeet)

The matching vcpkg overlay port (parakeet-cpp in qvac-registry-vcpkg) consumes the standalone GitHub mirror today; once this PR lands we can flip the port to point at this subtree directly if desired.

Test plan

parakeet-cpp/scripts/setup-ggml.sh applies all three patches cleanly and is idempotent on re-run.
cmake -S parakeet-cpp -B parakeet-cpp/build -DCMAKE_BUILD_TYPE=Release && cmake --build parakeet-cpp/build succeeds with no targets renamed in whisper's build.
ctest --test-dir parakeet-cpp/build --output-on-failure passes (unit + fixture labels) with the standard CTC / TDT / Sortformer GGUFs + NeMo .npy references staged under parakeet-cpp/{models,artifacts}/.
parakeet-cpp/build/parakeet --model models/parakeet-ctc-0.6b.q8_0.gguf --wav test/samples/jfk.wav produces the expected JFK transcript.
Top-level whisper build (cmake -S . -B build && cmake --build build) still succeeds unchanged — the new subtree is not pulled into it.
Porting to vcpkg registry and ensuring library can be consumed from addons for all platforms: win-x64, linux-x64, linux-arm64, darwin-x64, darwin-arm64, android-arm64, ios-arm64

@gianni-cor

…review) Address @gianni-cor review on PR ggml-org#11: switch the bundled ggml filename prefix from `libparakeet-ggml-*` to `libspeech-ggml-*` so the QVAC speech stack (whisper, parakeet, chatterbox, supertonic, ...) can co-vendor a single ggml file set instead of each library shipping its own copy. - parakeet-cpp/CMakeLists.txt: OUTPUT_NAME prefix `parakeet-` -> `speech-`, GGML_BACKEND_DL_PROJECT_PREFIX macro `"parakeet-"` -> `"speech-"`, option blurb + status message updated. - parakeet-cpp/README.md, patches/README.md, scripts/setup-ggml.sh, patches/ggml-backend-reg-filename-prefix.patch: doc / comment / example updated to reference the new `speech-` prefix. Verified: setup-ggml.sh re-applies all patches cleanly; CMake configure prints `bundled ggml libraries will be emitted as libspeech-ggml-*`; build emits libspeech-ggml{,-base,-cpu,-blas,-metal}.{0,0.9.11}.dylib; parakeet binary's otool -L now references `libspeech-ggml*` exclusively. Co-authored-by: Cursor <cursoragent@cursor.com>

gianni-cor

Big +1 on landing this. The follow-up commit c6c3fd7 switches the bundled-ggml prefix to libspeech-ggml-*, which is exactly what I asked for in the prior review and makes the speech stack (whisper, parakeet, chatterbox, supertonic, …) co-vendor a single ggml file set. Verified locally:

scripts/setup-ggml.sh clones ggml@58c38058, applies all three patches cleanly, and is safe to re-run (resets to pristine + re-applies — i.e. functionally idempotent rather than no-op-on-second-run, but you end at the same state).
cmake -S parakeet-cpp -B parakeet-cpp/build configures cleanly and the top-level whisper.cpp build is untouched (the subtree is opt-in).

This is a high-quality, well-scoped contribution. The C++ surface is small and well-documented (include/parakeet/), the engine is PIMPL'd for ABI stability, the CTest harnesses auto-disable when fixtures are missing, and the three ggml patches all default to upstream-byte-equal behaviour when their triggers are off. Approving so the PR is unblocked. The notes below are mostly polish or footguns to address in a follow-up; none of them block landing this subtree.

Approving — must-fix-soonish (correctness/footguns)

parakeet-cpp/src/parakeet_log.cpp — g_user_data is not atomic.
```
std::atomic<ggml_log_callback> g_callback{nullptr};
void * g_user_data = nullptr;
```
log_set_callback writes to g_user_data (non-atomic) and g_callback (atomic release). log_impl reads g_callback (acquire) and then g_user_data. A concurrent log_set_callback racing with a log_impl can deliver the new callback with the old user_data (or vice-versa). Make it std::atomic<void *> or, simpler, pack (cb, user_data) into a single shared_ptr<struct {…}> and do an atomic swap.
parakeet-cpp/src/parakeet_engine.cpp — SortformerStreamSession borrows the engine's mel_state.
```
DiarizationResult diar;
{
    const float * win = ring.data() + off;
    diar = engine_impl_diarize_helper(*engine_impl, win, n, opts.sample_rate, diopts);
}
```
engine_impl_diarize_helper writes to engine_impl->mel_state. StreamSession::Impl already carries its own MelState mel_state (correctly), but SortformerStreamSession::Impl uses the engine's. Two diarize_starts on one Engine — even if you serialise their feed_pcm_* calls — would alias the mel scratch in surprising ways. Either move MelState onto SortformerStreamSession::Impl like the other path, or document that "one diarize stream session per Engine instance" is required (similar to Engine::transcribe* which is documented as single-threaded per instance).
parakeet-cpp/src/parakeet_ctc.h — run_encoder(capture_intermediates=true) is a default-on footgun.
```
int run_encoder(ParakeetCtcModel   & model,
                const float        * mel,
                int                  n_mel_frames,
                int                  n_mels,
                EncoderOutputs     & out,
                int                  max_layers = -1,
                bool                 capture_intermediates = true);
```
Every internal caller (engine.cpp, prewarm, all transcribe/stream/diarize paths, profile harnesses) passes false. The 5+ MB device→host copy the comment warns about ships only to new external callers who didn't read the doc. Consider flipping the default to false, or splitting into run_encoder (fast, only encoder_out + CTC logits) and run_encoder_capture (parity-test convenience). Today the only thing keeping the default at true is the per-stage parity harnesses, and they all pass the flag explicitly anyway.
parakeet-cpp/include/parakeet/export.h — PARAKEET_API is empty under STATIC.
```
#pragma once
...
#ifdef PARAKEET_SHARED
...
#else
#  define PARAKEET_API
#endif
```
With the default static build, PARAKEET_API expands to nothing. The parakeet target is also CXX_VISIBILITY_PRESET hidden, so symbols compiled into libparakeet.a carry hidden visibility. That's fine for "static lib → executable", but consumers wrapping libparakeet.a into their own shared library (e.g. a Node addon for QVAC) cannot re-export the API surface even with their own __attribute__((visibility("default"))) wrappers, because the inner symbol is already marked hidden. Either:
- flip PARAKEET_API to __attribute__((visibility("default"))) always (ELF) / nothing (static-link Windows), so static-lib symbols land at default visibility regardless of build mode, or
- explicitly document in export.h that consumers wrapping the static lib in a shared object must compile parakeet with -DPARAKEET_SHARED -DPARAKEET_BUILD to get the right visibility.

parakeet-cpp/src/parakeet_engine.cpp — divide-by-zero waiting to happen.

inline double encoder_frame_stride_ms(const ParakeetCtcModel & model) {
    const int hop = model.mel_cfg.hop_length;
    const int sub = model.encoder_cfg.subsampling_factor > 0
                  ? model.encoder_cfg.subsampling_factor : 8;
    const int sr  = model.mel_cfg.sample_rate > 0 ? model.mel_cfg.sample_rate : 16000;
    return 1000.0 * (double) (hop * sub) / (double) sr;
}

sub and sr are fallback-defended; hop is not. If a future GGUF arrives with hop_length missing or 0 the function returns 0 ms and downstream code (frames_per_window = chunk_ms / 0, frame_samples = round(sr * 0 / 1000)) divides by zero. Same one-liner: const int hop = model.mel_cfg.hop_length > 0 ? model.mel_cfg.hop_length : 160;

Approving — nice-to-fix polish

parakeet-cpp/src/parakeet_engine.cpp — ~StreamSession() try { bool = true; } catch(...).
```
StreamSession::~StreamSession() {
    if (pimpl_ && !pimpl_->finalized && !pimpl_->cancelled) {
        try { pimpl_->cancelled = true; } catch (...) {}
    }
}
```
Assigning a bool can't throw, so the try/catch is dead code. Same pattern in ~SortformerStreamSession().
Streaming detokenize is O(n²) per session.
```
const size_t prev_cumulative_len = result.text.size();
result.token_ids.insert(result.token_ids.end(),
                        win_tokens.begin(), win_tokens.end());
result.text = detokenize(pimpl_->model.vocab, result.token_ids);
const std::string win_text = result.text.substr(prev_cumulative_len);
```
result.text = detokenize(model.vocab, result.token_ids) re-detokenises the whole cumulative token list each chunk. Same shape in StreamSession::Impl::process_window (line ~847). For typical 30s utterances this is fine; for hour-long live captioning sessions it adds a quadratic tail. Cheap fix: detokenise only win_tokens and append, mirroring how cumulative_token_ids is grown.
parakeet-cpp/src/main.cpp — #define setenv pollutes the TU.
```
#ifdef _WIN32
static int parakeet_setenv(const char * name, const char * value, int /*overwrite*/) {
    return _putenv_s(name, value);
}
#define setenv parakeet_setenv
#endif
```
Macro-redefining a libc symbol in a translation unit is fragile (any later <cstdlib> re-include path reaches a tokenised setenv). Inline-rename the call sites to parakeet_setenv and drop the #define, or wrap as inline int compat_setenv(...) { ... } and call that.

PARAKEET_FLASH_ATTN ON-by-default vs PARAKEET_EXPERIMENTAL_FLASH_ATTN macro.

if (GGML_METAL)
    set(PARAKEET_FLASH_ATTN_DEFAULT ON)
else()
    set(PARAKEET_FLASH_ATTN_DEFAULT OFF)
endif()
option(PARAKEET_FLASH_ATTN "parakeet: enable fused flash-attn in MHA (default ON for Metal; OFF elsewhere pending per-backend A/B)" ${PARAKEET_FLASH_ATTN_DEFAULT})
if (PARAKEET_FLASH_ATTN)
    target_compile_definitions(parakeet PRIVATE PARAKEET_EXPERIMENTAL_FLASH_ATTN)
endif()

The CMake option is shipped on-by-default on Metal but the C++ side gates on PARAKEET_EXPERIMENTAL_FLASH_ATTN. If the path is good enough to be the Metal default, drop the EXPERIMENTAL_ prefix from the macro. Otherwise gate the option default to OFF until the prefix goes away.

parakeet-cpp/scripts/download-all-models.sh — no integrity verification.

fetch() {
  local url="$1" dest="$2"
  if [[ -f "$dest" ]]; then
    local sz; sz=$(stat -f%z "$dest" 2>/dev/null || stat -c%s "$dest")
    echo "  exists: $dest ($(bytes_human "$sz")) — skipping"
    return 0
  fi
  mkdir -p "$(dirname "$dest")"
  echo "  fetching: $url"
  echo "          -> $dest"
  curl -L --fail --progress-bar -o "$dest.tmp" "$url"
  mv "$dest.tmp" "$dest"
  ...
}

A corrupted partial download silently succeeds — the failure surfaces later in convert-nemo-to-gguf.py with a confusing "tar: unexpected EOF" rather than at fetch time. Pin to a specific HF revision (/resolve/<sha>/...) and add a sha256sum -c step, or at minimum a size sanity check vs an expected number.

parakeet-cpp/scripts/convert-nemo-to-gguf.py — --hf-repo defaults to CTC 0.6B.
```
p.add_argument("--hf-repo", default="nvidia/parakeet-ctc-0.6b",
               help="HF model id to download from if --ckpt is missing.")
```
Self-documented as a footgun in the docstring (line 27-29). Auto-derive --hf-repo from the --ckpt filename when it follows the <model>.nemo convention (e.g. parakeet-tdt-0.6b-v3.nemo → nvidia/parakeet-tdt-0.6b-v3), and fall back to error-out instead of CTC when the filename doesn't match a known prefix. Saves a class of "I downloaded the wrong weights" support tickets.
parakeet-cpp/test/test_streaming.cpp and test_decoder_determinism.cpp — fragile WAV reader.
```
FILE * f = std::fopen(opts.wav_path.c_str(), "rb");
...
std::fseek(f, 0, SEEK_END);
long sz = std::ftell(f);
std::fseek(f, 44, SEEK_SET);
std::vector<int16_t> i16((sz - 44) / 2);
std::fread(i16.data(), 2, i16.size(), f);
```
Hard-codes a 44-byte WAV header. Fine for the committed test/samples/*.wav fixtures, but any RIFF chunk besides the canonical fmt (e.g. LIST INFO, BWF bext) shifts the data offset and the test silently mis-parses samples. Either route through parakeet::load_wav_mono_f32 (already linked via parakeet), or scan for the data chunk header.
parakeet-cpp/src/energy_vad.h — fixed 64 KB buffer per session.
```
int     window_pos_    = 0;       // write index into window_sq_
float   window_sq_[16000];        // big enough for window_ms <= 1 s @ 16 kHz
```
EnergyVad is heap-allocated per StreamSession so this is 64 KB per live session — fine for typical use, but the 1-s @ 16 kHz cap is an implicit limit hidden in the field declaration. Either move to std::vector<float> sized at construction (cheap; one allocation per session) or document the cap on the constructor.

Documentation polish

parakeet-cpp/README.md still reads as the standalone repo's README.

## 1. Clone and build

```bash
git clone <this-repo> parakeet.cpp
cd parakeet.cpp
./scripts/setup-ggml.sh

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release


When this is consumed as a subtree under `qvac-ext-lib-whisper.cpp/parakeet-cpp/`, the `git clone` instruction doesn't apply and the CWD is `parakeet-cpp/`, not `parakeet.cpp/`. Add a header note clarifying this is the in-tree subtree variant, or update §1 to do `cd parakeet-cpp` after a top-level repo clone.

Top-level README.md (whisper.cpp) doesn't mention parakeet-cpp.

grep -i parakeet README.md returns nothing. The PR description correctly notes the existing whisper build/headers/vcpkg surface is untouched, but a one-line discovery pointer ("This repository also vendors parakeet-cpp/ — see parakeet-cpp/README.md") would help new contributors / users who land on the top-level README and would otherwise miss the new subtree entirely.

What I liked

Three ggml patches are tight, well-documented, and individually drop-out-able once upstream catches up. The byte-equal-to-upstream guarantee when the trigger is off is exactly the right contract for a downstream fork.
EncoderGraph LRU cache (k_encoder_graph_cache_max = 3) + shape-keyed (T_mel, n_layers, all_valid) lookup is the right shape for streaming workloads.
Mel preprocess optimisations (real-FFT pack, thread_local twiddle cache, MelState reuse) are clean and clearly bench-driven.
EngineOptions::prewarm + test-decoder-determinism --prewarm gate is a great pattern — test the prewarm contract instead of trusting it.
Adreno-6xx CPU fallback policy and the PARAKEET_ALLOW_ADRENO_6XX=1 opt-out are exactly the right shape for a real production environment.
parakeet_apply_ggml_prefix + the companion ggml-backend-reg-filename-prefix.patch cleanly solves the in-process ggml-collision problem the QVAC speech stack will hit.
BackendDevice + backend_name() reflect the resolved backend after fallbacks, not the requested one — matches what consumers actually need to log.
Test labels (unit / fixture / perf / gpu) + parakeet_register_test(REQUIRES …) auto-disable for missing fixtures keeps ctest green on a fresh checkout.

Approving with the polish notes above.

gianni-cor

Posting items 1-5 from my approval as inline review comments so they're easier to thread / triage in the file diff. Same content as the approval body, just relocated to the offending lines.

gianni-cor · 2026-05-04T22:00:56Z

+namespace {
+
+std::atomic<ggml_log_callback> g_callback{nullptr};
+void * g_user_data = nullptr;


Race condition: g_user_data is non-atomic.

log_set_callback writes g_user_data (non-atomic) before storing g_callback with release ordering; log_impl reads g_callback (acquire) and then g_user_data. Concurrent log_set_callback vs log_impl can deliver the new callback with the old user_data (or vice versa) — the publish of (cb, user_data) is not atomic as a pair.

Fix options:

Make it std::atomic<void *> g_user_data{nullptr} and store user_data before cb (so the acquire-load of cb synchronises with both writes). Reads still race-free as long as log_impl loads cb first and then user_data.

Or pack (cb, user_data) into a single struct Sink { ggml_log_callback cb; void * ud; }, hold it in a std::shared_ptr<Sink>, and use std::atomic_store / std::atomic_load on the shared_ptr (or std::atomic<std::shared_ptr<Sink>> in C++20). Atomic swap of the pair makes the race impossible.

Low-impact in the QVAC use case (we're unlikely to call parakeet_log_set concurrently with logging in flight), but the public C entry point in <parakeet/log.h> invites that pattern from host applications.

gianni-cor · 2026-05-04T22:00:56Z

+    DiarizationResult diar;
+    {
+        const float * win = ring.data() + off;
+        diar = engine_impl_diarize_helper(*engine_impl, win, n, opts.sample_rate, diopts);


SortformerStreamSession aliases the engine's MelState.

engine_impl_diarize_helper writes to engine_impl->mel_state. StreamSession::Impl already carries its own MelState mel_state member (line 743) for exactly this reason — the encoder + decoder pipelines run independently of the parent Engine's mel scratch. The Sortformer streaming path doesn't follow that pattern: every process_chunk here clobbers the engine-owned state.

This matters in two scenarios that the public API allows:

Two diarize_start sessions on the same Engine. Even with serialised feed_pcm_* calls, both sessions' process_chunk would alias the same mel_state buffer.

Engine::diarize() while a SortformerStreamSession is running. The engine's own one-shot diarize path also uses pimpl_->mel_state (line ~552), so a diarize_samples() racing with a streaming session's feed_pcm_f32() triggers the same alias.

Fix: lift MelState mel_state onto SortformerStreamSession::Impl (mirrors what StreamSession::Impl does) and pass it through to engine_impl_diarize_helper via an extra param, or document the constraint as "one stream/diarize call per Engine instance at a time" and audit engine.h to make that explicit alongside the existing single-thread-per-Engine note.

Fix #1 is preferable; it's a 4-line change and removes the constraint entirely.

gianni-cor · 2026-05-04T22:00:56Z

+                int                  n_mels,
+                EncoderOutputs     & out,
+                int                  max_layers = -1,
+                bool                 capture_intermediates = true);


Default capture_intermediates = true is a footgun for new external callers.

Every internal caller passes false (engine.cpp transcribe, transcribe_samples_stream, diarize, prewarm; main.cpp run_once; live-mic / live-mic-attributed via Engine). The 5+ MB device→host roundtrip the comment above warns about only ships to new callers — exactly the population this header surface targets — who didn't read the doc.

The per-stage parity harnesses (test-encoder, test-tdt-encoder-parity, test-sortformer-parity, test-encoder-capture-parity) all pass capture_intermediates=true explicitly today, so flipping the default is safe.

Two equivalent fixes, pick one:

int run_encoder(ParakeetCtcModel & model, const float * mel, int n_mel_frames, int n_mels, EncoderOutputs & out, int max_layers = -1, bool capture_intermediates = false); // flip default

or split into two functions so the choice is at the call site:

int run_encoder(...); // production: encoder_out + CTC logits only int run_encoder_capture(...); // parity: + per-stage host copies

The split shape also makes it easier to grep for parity-only call sites in tests.

gianni-cor · 2026-05-04T22:00:56Z

+#    define PARAKEET_API __attribute__((visibility("default")))
+#  endif
+#else
+#  define PARAKEET_API


Static-build symbols are emitted with hidden visibility.

Under the default static build (PARAKEET_SHARED undefined), PARAKEET_API expands to nothing. The parakeet target is compiled with CXX_VISIBILITY_PRESET hidden + VISIBILITY_INLINES_HIDDEN ON (CMakeLists.txt line 302-303), so symbols compiled into libparakeet.a carry STV_HIDDEN.

That's fine for static lib → executable. Breaks for our QVAC use case: a Node addon (or any host shared library) that links libparakeet.a and tries to re-export the API surface to JavaScript via its own __attribute__((visibility("default"))) wrappers. The inner parakeet::Engine::transcribe symbol is already marked hidden in the .o, so the addon's wrapper compiles and links but dlsym / N-API can't resolve the indirect call back into the static lib's hidden symbols on some toolchains.

Two fixes, pick one:

Make PARAKEET_API always visible on ELF/Mach-O even in static builds — change the #else branch to:

#else # if defined(__GNUC__) || defined(__clang__) # define PARAKEET_API __attribute__((visibility("default"))) # else # define PARAKEET_API # endif #endif

No effect when the static lib lands in an executable (visibility is irrelevant for the final link); makes the symbols re-exportable from a wrapping shared object.

Document in this header (and in README.md §1) that consumers wrapping libparakeet.a in their own .so / .dylib must compile parakeet with -DPARAKEET_SHARED -DPARAKEET_BUILD.

Option 1 is the right answer for the QVAC speech stack — every addon that consumes this should not need to know the internal visibility convention.

gianni-cor · 2026-05-04T22:00:56Z

+// happen to land at 80 ms (16 kHz x hop=160 x sub=8) but new GGUFs may
+// differ -- e.g. a 24 kHz checkpoint or a 4x subsampling variant.
+inline double encoder_frame_stride_ms(const ParakeetCtcModel & model) {
+    const int hop = model.mel_cfg.hop_length;


Divide-by-zero waiting to happen when a future GGUF lacks hop_length.

sub and sr below are > 0 ? : default-guarded; hop is not. If a future GGUF is converted with parakeet.preproc.hop_length missing or 0, this returns 0.0, then downstream:

transcribe_samples_stream: frames_per_window = floor(chunk_ms / 0) → UB / inf

StreamSession::process_window: frame_samples = round(sr * 0 / 1000) = 0, then left_drop_frames = center_start_sample / 0 → SIGFPE

One-line fix matching the other two fields:

const int hop = model.mel_cfg.hop_length > 0 ? model.mel_cfg.hop_length : 160;

No current GGUF triggers this; the converter writes parakeet.preproc.hop_length=160 for every shipped checkpoint. Catching it here means a future converter / model variant fails the load with a clean error instead of crashing inside the streaming math.

@GustavoA1604

Android app packaging keeps native libraries compressed inside the APK with no on-disk directory to scan (AGP's `useLegacyPackaging=false` default since 3.6). The directory-iterator pass in `ggml_backend_load_best` therefore finds nothing on Android and the existing per-search_path `fs::exists` filename fallback also returns false, leaving the loader to return nullptr and the consumer to fail `init_cpu_backend()`. For backends that ship as a single library (Vulkan / OpenCL / ...) the bare `lib<prefix>ggml-<name>.so` filename is enough to resolve via Android's in-APK linker lookup, but with `GGML_CPU_ALL_VARIANTS=ON` (the qvac-registry-vcpkg whisper-cpp port default for Android per QVAC-18993) the CPU backend ships only as per-arch variants -- there is no plain `libggml-cpu.so` for the fallback to compose, so the CPU backend silently never registers. Enumerate the known per-arch Android variants as additional candidate names for the "cpu" backend and run each through the standard `ggml_backend_score` selection so the device's HWCAP picks the right tier (armv8.0 baseline through armv9.2_2; matches the variants list emitted by `ggml_add_cpu_backend_variant()` in ggml/src/CMakeLists.txt around lines 410-416). Fast-path for the size-1 candidate case (every backend on every non-Android platform, plus Vulkan / OpenCL / Metal / ... on Android): single load_backend call, identical cost to the previous code path. The score-then-reload loop only runs when there's an actual choice to make. Mirrors qvac-ext-ggml@speech commit 9562ed04 ("ggml-backend: android per-arch CPU variant dlopen fallback", @GustavoA1604, PR #11). Carried here as a separate commit on top of the v1.8.4.3 upstream-sync branch so the whisper-cpp vcpkg port can ship Android dynamic-backend mode without a port-level patch (`patches/0002-...`). Validated by an NDK r29 cross-compile of bundled ggml + whisper.cpp with -DGGML_BACKEND_DL=ON -DBUILD_SHARED_LIBS=OFF -DGGML_CPU_ALL_VARIANTS=ON -DGGML_CPU_REPACK=ON: - all 7 per-arch libggml-cpu-android_armv*_*.so produced clean; - `strings ggml-backend-reg.cpp.o | grep cpu-android_armv` confirms the __ANDROID__ block compiles into the dispatcher object. Co-authored-by: Cursor <cursoragent@cursor.com>

@gianni-cor

…review) Address @gianni-cor review on PR #11: switch the bundled ggml filename prefix from `libparakeet-ggml-*` to `libspeech-ggml-*` so the QVAC speech stack (whisper, parakeet, chatterbox, supertonic, ...) can co-vendor a single ggml file set instead of each library shipping its own copy. - parakeet-cpp/CMakeLists.txt: OUTPUT_NAME prefix `parakeet-` -> `speech-`, GGML_BACKEND_DL_PROJECT_PREFIX macro `"parakeet-"` -> `"speech-"`, option blurb + status message updated. - parakeet-cpp/README.md, patches/README.md, scripts/setup-ggml.sh, patches/ggml-backend-reg-filename-prefix.patch: doc / comment / example updated to reference the new `speech-` prefix. Verified: setup-ggml.sh re-applies all patches cleanly; CMake configure prints `bundled ggml libraries will be emitted as libspeech-ggml-*`; build emits libspeech-ggml{,-base,-cpu,-blas,-metal}.{0,0.9.11}.dylib; parakeet binary's otool -L now references `libspeech-ggml*` exclusively. Co-authored-by: Cursor <cursoragent@cursor.com>

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml

@GustavoA1604

Android app packaging keeps native libraries compressed inside the APK with no on-disk directory to scan (AGP's `useLegacyPackaging=false` default since 3.6). The directory-iterator pass in `ggml_backend_load_best` therefore finds nothing on Android and the existing per-search_path `fs::exists` filename fallback also returns false, leaving the loader to return nullptr and the consumer to fail `init_cpu_backend()`. For backends that ship as a single library (Vulkan / OpenCL / ...) the bare `lib<prefix>ggml-<name>.so` filename is enough to resolve via Android's in-APK linker lookup, but with `GGML_CPU_ALL_VARIANTS=ON` (the qvac-registry-vcpkg whisper-cpp port default for Android per QVAC-18993) the CPU backend ships only as per-arch variants -- there is no plain `libggml-cpu.so` for the fallback to compose, so the CPU backend silently never registers. Enumerate the known per-arch Android variants as additional candidate names for the "cpu" backend and run each through the standard `ggml_backend_score` selection so the device's HWCAP picks the right tier (armv8.0 baseline through armv9.2_2; matches the variants list emitted by `ggml_add_cpu_backend_variant()` in ggml/src/CMakeLists.txt around lines 410-416). Fast-path for the size-1 candidate case (every backend on every non-Android platform, plus Vulkan / OpenCL / Metal / ... on Android): single load_backend call, identical cost to the previous code path. The score-then-reload loop only runs when there's an actual choice to make. Mirrors qvac-ext-ggml@speech commit 9562ed04 ("ggml-backend: android per-arch CPU variant dlopen fallback", @GustavoA1604, PR #11). Carried here as a separate commit on top of the v1.8.4.3 upstream-sync branch so the whisper-cpp vcpkg port can ship Android dynamic-backend mode without a port-level patch (`patches/0002-...`). Validated by an NDK r29 cross-compile of bundled ggml + whisper.cpp with -DGGML_BACKEND_DL=ON -DBUILD_SHARED_LIBS=OFF -DGGML_CPU_ALL_VARIANTS=ON -DGGML_CPU_REPACK=ON: - all 7 per-arch libggml-cpu-android_armv*_*.so produced clean; - `strings ggml-backend-reg.cpp.o | grep cpu-android_armv` confirms the __ANDROID__ block compiles into the dispatcher object. Co-authored-by: Cursor <cursoragent@cursor.com>

Add parakeet-cpp port

d7ab516

GustavoA1604 requested review from a team as code owners May 1, 2026 18:53

gianni-cor requested changes May 4, 2026

View reviewed changes

Comment thread parakeet-cpp/CMakeLists.txt Outdated

gianni-cor approved these changes May 4, 2026

View reviewed changes

gianni-cor reviewed May 4, 2026

View reviewed changes

GustavoA1604 merged commit a6785de into tetherto:master May 5, 2026
58 of 66 checks passed

This was referenced May 19, 2026

QVAC-18993: bundled-ggml Android dynamic-backend + tts-cpp <atomic> fix #26

Closed

QVAC-18993: bundled-ggml — Android dynamic backend + per-arch CPU dlopen fallback #28

Merged

gianni-cor pushed a commit that referenced this pull request May 28, 2026

Merge pull request #11 from GustavoA1604/add-parakeet-cpp

96a6844

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml#11

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml#11
GustavoA1604 merged 2 commits into
tetherto:masterfrom
GustavoA1604:add-parakeet-cpp

GustavoA1604 commented May 1, 2026

Uh oh!

Uh oh!

gianni-cor left a comment

Uh oh!

gianni-cor left a comment

Uh oh!

gianni-cor May 4, 2026

Uh oh!

gianni-cor May 4, 2026

Uh oh!

gianni-cor May 4, 2026

Uh oh!

gianni-cor May 4, 2026

Uh oh!

gianni-cor May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GustavoA1604 commented May 1, 2026

Summary

What ships

ggml patches

Layout

Why land it under qvac-ext-lib-whisper.cpp/

Build

Consumption

Test plan

Uh oh!

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Approving — must-fix-soonish (correctness/footguns)

Approving — nice-to-fix polish

Documentation polish

What I liked

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Uh oh!

gianni-cor May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gianni-cor May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gianni-cor May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gianni-cor May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gianni-cor May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why land it under `qvac-ext-lib-whisper.cpp/`