Skip to content

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml#11

Merged
GustavoA1604 merged 2 commits into
tetherto:masterfrom
GustavoA1604:add-parakeet-cpp
May 5, 2026
Merged

Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml#11
GustavoA1604 merged 2 commits into
tetherto:masterfrom
GustavoA1604:add-parakeet-cpp

Conversation

@GustavoA1604

Copy link
Copy Markdown

Note: out of the 125k lines addition, most of it (91k) are due to adding parakeet-cpp/examples/miniaudio.h, which mirrors the whisper.cpp/examples/miniaudio.h file. I didn't want to add a symbolic link to keep projects separated.

Summary

Vendors parakeet-cpp under parakeet-cpp/ — a pure C++/ggml inference port of the NVIDIA Parakeet family (FastConformer ASR + Sortformer diarization). One parakeet::Engine loads CTC, TDT, EOU, or Sortformer GGUFs and dispatches by metadata; no Python / PyTorch / ONNX Runtime at runtime.

The subtree is fully self-contained:

  • Builds independently via its own parakeet-cpp/CMakeLists.txt (PARAKEET_BUILD_LIBRARY / PARAKEET_BUILD_EXECUTABLES / PARAKEET_BUILD_TESTS / PARAKEET_BUILD_EXAMPLES); the top-level whisper.cpp build is not touched, so existing whisper consumers keep their current shape.
  • Bundles ggml via add_subdirectory(ggml) after scripts/setup-ggml.sh clones it at the pinned commit (58c38058) and applies the three local patches under parakeet-cpp/patches/ (see below).
  • Ships its own CMake package config — install(EXPORT parakeet-cpp-targets NAMESPACE parakeet::) + parakeet-cppConfig.cmake.in — so downstream code resolves parakeet::parakeet and ggml::ggml from a find_package(parakeet-cpp CONFIG REQUIRED).

What ships

Area Highlights
Engines CTC (parakeet-ctc-{0.6b,1.1b}), TDT (parakeet-tdt-{0.6b-v3,1.1b}, multilingual), EOU (parakeet_realtime_eou_120m-v1, <EOU> turn-detection token), Sortformer (diar_sortformer_4spk-v1 / diar_streaming_sortformer_4spk-v2, up to 4 speakers)
Streaming Mode 2 (full encode → segmented stream), Mode 3 (push duplex with chunked-limited attention + cache-aware history). Speaker-attributed transcription via --diarization-model.
Cross-engine events Single StreamEvent umbrella (EndOfTurn from EOU's <EOU>, VadStateChanged from Sortformer probs and an opt-in CPU energy-VAD on CTC/TDT).
CLI / examples parakeet CLI with --bench / --profile / JSONL emit / OpenCL knobs; live-mic, live-mic-attributed examples on miniaudio.
Test suite Per-stage NeMo parity (mel / encoder / TDT / Sortformer), encoder-capture parity, decoder determinism, mel FFT parity, optional test-vk-vs-cpu. CTest labels: unit, fixture, perf, gpu.
Backends CPU (BLAS optional) + CUDA / Metal / Vulkan / OpenCL via ggml_backend_load_all() + registry walk; Adreno-6xx OpenCL fallback to CPU is preserved.
Conversion scripts/convert-nemo-to-gguf.py (.nemo.gguf with f32 / f16 / q8_0 / q5_0 / q4_0); scripts/dump-{ctc,tdt,eou,sortformer}-reference.py for the parity harnesses.
Docs parakeet-cpp/README.md (quickstart, GPU build matrix, benchmarks vs onnxruntime, CMake knobs), parakeet-cpp/PROGRESS.md (full development history).

ggml patches

Three patches live under parakeet-cpp/patches/ and are applied in lex order by scripts/setup-ggml.sh. Each is a strict no-op when its trigger is not active, so they don't disturb stock ggml consumers.

  1. ggml-backend-reg-filename-prefix.patch — adds a compile-time GGML_BACKEND_DL_PROJECT_PREFIX macro to backend_filename_prefix() so a host project that renames its bundled libggml-* files (parakeet does this via PARAKEET_GGML_LIB_PREFIX=ON, default) does not break runtime backend discovery under GGML_BACKEND_DL=ON. Macro undefined ⇒ behaviour byte-equal to upstream.
  2. ggml-opencl-allow-non-adreno.patch — opt-in (GGML_OPENCL_ALLOW_UNKNOWN_GPU=1) relax of the Adreno/Intel-only device whitelist so dev hosts on NVIDIA / AMD / Intel iGPU can build and parity-test the OpenCL backend without an Adreno device. Adreno production path unchanged.
  3. ggml-opencl-program-binary-cache.patch — persistent on-disk cache for compiled OpenCL kernel binaries via clCreateProgramWithBinary, keyed on (src, opts, driver, dev) FNV-1a-64 hashes; honours $GGML_OPENCL_CACHE_DIR (with $XDG_CACHE_HOME/ggml/opencl$HOME/.cache/ggml/opencl fallbacks). Removes the multi-second cold-start clBuildProgram wave on Adreno / Mesa / Mali.

parakeet-cpp/patches/README.md documents each patch and the drop conditions.

Layout

parakeet-cpp/
├── CMakeLists.txt               # standalone build (library / CLI / tests / examples)
├── cmake/
│   └── parakeet-cppConfig.cmake.in
├── include/parakeet/            # public headers (Engine, StreamSession, …)
├── src/                         # engines, decoders, mel, CLI
├── examples/                    # live-mic, live-mic-attributed (+ vendored miniaudio)
├── test/                        # CTest sources
├── scripts/                     # setup-ggml.sh, convert-nemo-to-gguf.py, NeMo dumps,
│                                # download-all-models.sh, optional maintainer tools
├── patches/                     # ggml patches applied by setup-ggml.sh
├── README.md                    # quickstart + benchmarks + CMake knob reference
├── PROGRESS.md                  # full development history (Phases 1–13)
├── LICENSE                      # Apache-2.0
└── NOTICE

The parakeet-cpp/.gitignore keeps the cloned ggml/, models/, artifacts/*.npy, and any build*/ trees out of git.

Why land it under qvac-ext-lib-whisper.cpp/

Centralises the in-house speech stack alongside whisper.cpp; both libraries share the same ggml backend ecosystem and the same packaging / vcpkg pipeline, so reviewers, CI, and consumers (vcpkg port, addon hosts) only need to track one upstream repo. The subtree is opt-in at build time and does not change the existing whisper build, headers, or vcpkg surface.

Build

From parakeet-cpp/:

./scripts/setup-ggml.sh                                         # clone+patch ggml
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)

GPU backends are configure-time:

# Apple Silicon
cmake -S . -B build-metal -DCMAKE_BUILD_TYPE=Release -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON
# Vulkan / desktop
cmake -S . -B build-vk    -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON
# OpenCL (Adreno / Android; or NVIDIA/AMD/Intel for parity testing)
cmake -S . -B build-cl    -DCMAKE_BUILD_TYPE=Release -DGGML_OPENCL=ON

ctest --test-dir build --output-on-failure after the optional scripts/dump-*-reference.py step. Missing fixtures auto-disable individual tests so a fresh checkout still gives a green run.

Consumption

find_package(parakeet-cpp CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE parakeet::parakeet)

The matching vcpkg overlay port (parakeet-cpp in qvac-registry-vcpkg) consumes the standalone GitHub mirror today; once this PR lands we can flip the port to point at this subtree directly if desired.

Test plan

  • parakeet-cpp/scripts/setup-ggml.sh applies all three patches cleanly and is idempotent on re-run.
  • cmake -S parakeet-cpp -B parakeet-cpp/build -DCMAKE_BUILD_TYPE=Release && cmake --build parakeet-cpp/build succeeds with no targets renamed in whisper's build.
  • ctest --test-dir parakeet-cpp/build --output-on-failure passes (unit + fixture labels) with the standard CTC / TDT / Sortformer GGUFs + NeMo .npy references staged under parakeet-cpp/{models,artifacts}/.
  • parakeet-cpp/build/parakeet --model models/parakeet-ctc-0.6b.q8_0.gguf --wav test/samples/jfk.wav produces the expected JFK transcript.
  • Top-level whisper build (cmake -S . -B build && cmake --build build) still succeeds unchanged — the new subtree is not pulled into it.
  • Porting to vcpkg registry and ensuring library can be consumed from addons for all platforms: win-x64, linux-x64, linux-arm64, darwin-x64, darwin-arm64, android-arm64, ios-arm64

@GustavoA1604 GustavoA1604 requested review from a team as code owners May 1, 2026 18:53
Comment thread parakeet-cpp/CMakeLists.txt Outdated
…review)

Address @gianni-cor review on PR ggml-org#11: switch the bundled ggml filename
prefix from `libparakeet-ggml-*` to `libspeech-ggml-*` so the QVAC speech
stack (whisper, parakeet, chatterbox, supertonic, ...) can co-vendor a
single ggml file set instead of each library shipping its own copy.

  - parakeet-cpp/CMakeLists.txt: OUTPUT_NAME prefix `parakeet-` -> `speech-`,
    GGML_BACKEND_DL_PROJECT_PREFIX macro `"parakeet-"` -> `"speech-"`,
    option blurb + status message updated.
  - parakeet-cpp/README.md, patches/README.md, scripts/setup-ggml.sh,
    patches/ggml-backend-reg-filename-prefix.patch: doc / comment / example
    updated to reference the new `speech-` prefix.

Verified: setup-ggml.sh re-applies all patches cleanly; CMake configure
prints `bundled ggml libraries will be emitted as libspeech-ggml-*`;
build emits libspeech-ggml{,-base,-cpu,-blas,-metal}.{0,0.9.11}.dylib;
parakeet binary's otool -L now references `libspeech-ggml*` exclusively.

Co-authored-by: Cursor <cursoragent@cursor.com>

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 on landing this. The follow-up commit c6c3fd7 switches the bundled-ggml prefix to libspeech-ggml-*, which is exactly what I asked for in the prior review and makes the speech stack (whisper, parakeet, chatterbox, supertonic, …) co-vendor a single ggml file set. Verified locally:

  • scripts/setup-ggml.sh clones ggml@58c38058, applies all three patches cleanly, and is safe to re-run (resets to pristine + re-applies — i.e. functionally idempotent rather than no-op-on-second-run, but you end at the same state).
  • cmake -S parakeet-cpp -B parakeet-cpp/build configures cleanly and the top-level whisper.cpp build is untouched (the subtree is opt-in).

This is a high-quality, well-scoped contribution. The C++ surface is small and well-documented (include/parakeet/), the engine is PIMPL'd for ABI stability, the CTest harnesses auto-disable when fixtures are missing, and the three ggml patches all default to upstream-byte-equal behaviour when their triggers are off. Approving so the PR is unblocked. The notes below are mostly polish or footguns to address in a follow-up; none of them block landing this subtree.

Approving — must-fix-soonish (correctness/footguns)

  1. parakeet-cpp/src/parakeet_log.cppg_user_data is not atomic.

    std::atomic<ggml_log_callback> g_callback{nullptr};
    void * g_user_data = nullptr;

    log_set_callback writes to g_user_data (non-atomic) and g_callback (atomic release). log_impl reads g_callback (acquire) and then g_user_data. A concurrent log_set_callback racing with a log_impl can deliver the new callback with the old user_data (or vice-versa). Make it std::atomic<void *> or, simpler, pack (cb, user_data) into a single shared_ptr<struct {…}> and do an atomic swap.

  2. parakeet-cpp/src/parakeet_engine.cppSortformerStreamSession borrows the engine's mel_state.

    DiarizationResult diar;
    {
        const float * win = ring.data() + off;
        diar = engine_impl_diarize_helper(*engine_impl, win, n, opts.sample_rate, diopts);
    }

    engine_impl_diarize_helper writes to engine_impl->mel_state. StreamSession::Impl already carries its own MelState mel_state (correctly), but SortformerStreamSession::Impl uses the engine's. Two diarize_starts on one Engine — even if you serialise their feed_pcm_* calls — would alias the mel scratch in surprising ways. Either move MelState onto SortformerStreamSession::Impl like the other path, or document that "one diarize stream session per Engine instance" is required (similar to Engine::transcribe* which is documented as single-threaded per instance).

  3. parakeet-cpp/src/parakeet_ctc.hrun_encoder(capture_intermediates=true) is a default-on footgun.

    int run_encoder(ParakeetCtcModel   & model,
                    const float        * mel,
                    int                  n_mel_frames,
                    int                  n_mels,
                    EncoderOutputs     & out,
                    int                  max_layers = -1,
                    bool                 capture_intermediates = true);

    Every internal caller (engine.cpp, prewarm, all transcribe/stream/diarize paths, profile harnesses) passes false. The 5+ MB device→host copy the comment warns about ships only to new external callers who didn't read the doc. Consider flipping the default to false, or splitting into run_encoder (fast, only encoder_out + CTC logits) and run_encoder_capture (parity-test convenience). Today the only thing keeping the default at true is the per-stage parity harnesses, and they all pass the flag explicitly anyway.

  4. parakeet-cpp/include/parakeet/export.hPARAKEET_API is empty under STATIC.

    #pragma once
    ...
    #ifdef PARAKEET_SHARED
    ...
    #else
    #  define PARAKEET_API
    #endif

    With the default static build, PARAKEET_API expands to nothing. The parakeet target is also CXX_VISIBILITY_PRESET hidden, so symbols compiled into libparakeet.a carry hidden visibility. That's fine for "static lib → executable", but consumers wrapping libparakeet.a into their own shared library (e.g. a Node addon for QVAC) cannot re-export the API surface even with their own __attribute__((visibility("default"))) wrappers, because the inner symbol is already marked hidden. Either:

    • flip PARAKEET_API to __attribute__((visibility("default"))) always (ELF) / nothing (static-link Windows), so static-lib symbols land at default visibility regardless of build mode, or
    • explicitly document in export.h that consumers wrapping the static lib in a shared object must compile parakeet with -DPARAKEET_SHARED -DPARAKEET_BUILD to get the right visibility.
  5. parakeet-cpp/src/parakeet_engine.cpp — divide-by-zero waiting to happen.

    inline double encoder_frame_stride_ms(const ParakeetCtcModel & model) {
        const int hop = model.mel_cfg.hop_length;
        const int sub = model.encoder_cfg.subsampling_factor > 0
                      ? model.encoder_cfg.subsampling_factor : 8;
        const int sr  = model.mel_cfg.sample_rate > 0 ? model.mel_cfg.sample_rate : 16000;
        return 1000.0 * (double) (hop * sub) / (double) sr;
    }

    sub and sr are fallback-defended; hop is not. If a future GGUF arrives with hop_length missing or 0 the function returns 0 ms and downstream code (frames_per_window = chunk_ms / 0, frame_samples = round(sr * 0 / 1000)) divides by zero. Same one-liner: const int hop = model.mel_cfg.hop_length > 0 ? model.mel_cfg.hop_length : 160;

Approving — nice-to-fix polish

  1. parakeet-cpp/src/parakeet_engine.cpp~StreamSession() try { bool = true; } catch(...).

    StreamSession::~StreamSession() {
        if (pimpl_ && !pimpl_->finalized && !pimpl_->cancelled) {
            try { pimpl_->cancelled = true; } catch (...) {}
        }
    }

    Assigning a bool can't throw, so the try/catch is dead code. Same pattern in ~SortformerStreamSession().

  2. Streaming detokenize is O(n²) per session.

    const size_t prev_cumulative_len = result.text.size();
    result.token_ids.insert(result.token_ids.end(),
                            win_tokens.begin(), win_tokens.end());
    result.text = detokenize(pimpl_->model.vocab, result.token_ids);
    const std::string win_text = result.text.substr(prev_cumulative_len);

    result.text = detokenize(model.vocab, result.token_ids) re-detokenises the whole cumulative token list each chunk. Same shape in StreamSession::Impl::process_window (line ~847). For typical 30s utterances this is fine; for hour-long live captioning sessions it adds a quadratic tail. Cheap fix: detokenise only win_tokens and append, mirroring how cumulative_token_ids is grown.

  3. parakeet-cpp/src/main.cpp#define setenv pollutes the TU.

    #ifdef _WIN32
    static int parakeet_setenv(const char * name, const char * value, int /*overwrite*/) {
        return _putenv_s(name, value);
    }
    #define setenv parakeet_setenv
    #endif

    Macro-redefining a libc symbol in a translation unit is fragile (any later <cstdlib> re-include path reaches a tokenised setenv). Inline-rename the call sites to parakeet_setenv and drop the #define, or wrap as inline int compat_setenv(...) { ... } and call that.

  4. PARAKEET_FLASH_ATTN ON-by-default vs PARAKEET_EXPERIMENTAL_FLASH_ATTN macro.

    if (GGML_METAL)
        set(PARAKEET_FLASH_ATTN_DEFAULT ON)
    else()
        set(PARAKEET_FLASH_ATTN_DEFAULT OFF)
    endif()
    option(PARAKEET_FLASH_ATTN "parakeet: enable fused flash-attn in MHA (default ON for Metal; OFF elsewhere pending per-backend A/B)" ${PARAKEET_FLASH_ATTN_DEFAULT})
    if (PARAKEET_FLASH_ATTN)
        target_compile_definitions(parakeet PRIVATE PARAKEET_EXPERIMENTAL_FLASH_ATTN)
    endif()

    The CMake option is shipped on-by-default on Metal but the C++ side gates on PARAKEET_EXPERIMENTAL_FLASH_ATTN. If the path is good enough to be the Metal default, drop the EXPERIMENTAL_ prefix from the macro. Otherwise gate the option default to OFF until the prefix goes away.

  5. parakeet-cpp/scripts/download-all-models.sh — no integrity verification.

    fetch() {
      local url="$1" dest="$2"
      if [[ -f "$dest" ]]; then
        local sz; sz=$(stat -f%z "$dest" 2>/dev/null || stat -c%s "$dest")
        echo "  exists: $dest ($(bytes_human "$sz")) — skipping"
        return 0
      fi
      mkdir -p "$(dirname "$dest")"
      echo "  fetching: $url"
      echo "          -> $dest"
      curl -L --fail --progress-bar -o "$dest.tmp" "$url"
      mv "$dest.tmp" "$dest"
      ...
    }

    A corrupted partial download silently succeeds — the failure surfaces later in convert-nemo-to-gguf.py with a confusing "tar: unexpected EOF" rather than at fetch time. Pin to a specific HF revision (/resolve/<sha>/...) and add a sha256sum -c step, or at minimum a size sanity check vs an expected number.

  6. parakeet-cpp/scripts/convert-nemo-to-gguf.py--hf-repo defaults to CTC 0.6B.

    p.add_argument("--hf-repo", default="nvidia/parakeet-ctc-0.6b",
                   help="HF model id to download from if --ckpt is missing.")

    Self-documented as a footgun in the docstring (line 27-29). Auto-derive --hf-repo from the --ckpt filename when it follows the <model>.nemo convention (e.g. parakeet-tdt-0.6b-v3.nemonvidia/parakeet-tdt-0.6b-v3), and fall back to error-out instead of CTC when the filename doesn't match a known prefix. Saves a class of "I downloaded the wrong weights" support tickets.

  7. parakeet-cpp/test/test_streaming.cpp and test_decoder_determinism.cpp — fragile WAV reader.

    FILE * f = std::fopen(opts.wav_path.c_str(), "rb");
    ...
    std::fseek(f, 0, SEEK_END);
    long sz = std::ftell(f);
    std::fseek(f, 44, SEEK_SET);
    std::vector<int16_t> i16((sz - 44) / 2);
    std::fread(i16.data(), 2, i16.size(), f);

    Hard-codes a 44-byte WAV header. Fine for the committed test/samples/*.wav fixtures, but any RIFF chunk besides the canonical fmt (e.g. LIST INFO, BWF bext) shifts the data offset and the test silently mis-parses samples. Either route through parakeet::load_wav_mono_f32 (already linked via parakeet), or scan for the data chunk header.

  8. parakeet-cpp/src/energy_vad.h — fixed 64 KB buffer per session.

    int     window_pos_    = 0;       // write index into window_sq_
    float   window_sq_[16000];        // big enough for window_ms <= 1 s @ 16 kHz

    EnergyVad is heap-allocated per StreamSession so this is 64 KB per live session — fine for typical use, but the 1-s @ 16 kHz cap is an implicit limit hidden in the field declaration. Either move to std::vector<float> sized at construction (cheap; one allocation per session) or document the cap on the constructor.

Documentation polish

  1. parakeet-cpp/README.md still reads as the standalone repo's README.

    ## 1. Clone and build
    
    ```bash
    git clone <this-repo> parakeet.cpp
    cd parakeet.cpp
    ./scripts/setup-ggml.sh
    
    cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
    
    When this is consumed as a subtree under `qvac-ext-lib-whisper.cpp/parakeet-cpp/`, the `git clone` instruction doesn't apply and the CWD is `parakeet-cpp/`, not `parakeet.cpp/`. Add a header note clarifying this is the in-tree subtree variant, or update §1 to do `cd parakeet-cpp` after a top-level repo clone.
    
    
  2. Top-level README.md (whisper.cpp) doesn't mention parakeet-cpp.

    grep -i parakeet README.md returns nothing. The PR description correctly notes the existing whisper build/headers/vcpkg surface is untouched, but a one-line discovery pointer ("This repository also vendors parakeet-cpp/ — see parakeet-cpp/README.md") would help new contributors / users who land on the top-level README and would otherwise miss the new subtree entirely.

What I liked

  • Three ggml patches are tight, well-documented, and individually drop-out-able once upstream catches up. The byte-equal-to-upstream guarantee when the trigger is off is exactly the right contract for a downstream fork.
  • EncoderGraph LRU cache (k_encoder_graph_cache_max = 3) + shape-keyed (T_mel, n_layers, all_valid) lookup is the right shape for streaming workloads.
  • Mel preprocess optimisations (real-FFT pack, thread_local twiddle cache, MelState reuse) are clean and clearly bench-driven.
  • EngineOptions::prewarm + test-decoder-determinism --prewarm gate is a great pattern — test the prewarm contract instead of trusting it.
  • Adreno-6xx CPU fallback policy and the PARAKEET_ALLOW_ADRENO_6XX=1 opt-out are exactly the right shape for a real production environment.
  • parakeet_apply_ggml_prefix + the companion ggml-backend-reg-filename-prefix.patch cleanly solves the in-process ggml-collision problem the QVAC speech stack will hit.
  • BackendDevice + backend_name() reflect the resolved backend after fallbacks, not the requested one — matches what consumers actually need to log.
  • Test labels (unit / fixture / perf / gpu) + parakeet_register_test(REQUIRES …) auto-disable for missing fixtures keeps ctest green on a fresh checkout.

Approving with the polish notes above.

@gianni-cor gianni-cor left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting items 1-5 from my approval as inline review comments so they're easier to thread / triage in the file diff. Same content as the approval body, just relocated to the offending lines.

namespace {

std::atomic<ggml_log_callback> g_callback{nullptr};
void * g_user_data = nullptr;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: g_user_data is non-atomic.

log_set_callback writes g_user_data (non-atomic) before storing g_callback with release ordering; log_impl reads g_callback (acquire) and then g_user_data. Concurrent log_set_callback vs log_impl can deliver the new callback with the old user_data (or vice versa) — the publish of (cb, user_data) is not atomic as a pair.

Fix options:

  • Make it std::atomic<void *> g_user_data{nullptr} and store user_data before cb (so the acquire-load of cb synchronises with both writes). Reads still race-free as long as log_impl loads cb first and then user_data.
  • Or pack (cb, user_data) into a single struct Sink { ggml_log_callback cb; void * ud; }, hold it in a std::shared_ptr<Sink>, and use std::atomic_store / std::atomic_load on the shared_ptr (or std::atomic<std::shared_ptr<Sink>> in C++20). Atomic swap of the pair makes the race impossible.

Low-impact in the QVAC use case (we're unlikely to call parakeet_log_set concurrently with logging in flight), but the public C entry point in <parakeet/log.h> invites that pattern from host applications.

DiarizationResult diar;
{
const float * win = ring.data() + off;
diar = engine_impl_diarize_helper(*engine_impl, win, n, opts.sample_rate, diopts);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortformerStreamSession aliases the engine's MelState.

engine_impl_diarize_helper writes to engine_impl->mel_state. StreamSession::Impl already carries its own MelState mel_state member (line 743) for exactly this reason — the encoder + decoder pipelines run independently of the parent Engine's mel scratch. The Sortformer streaming path doesn't follow that pattern: every process_chunk here clobbers the engine-owned state.

This matters in two scenarios that the public API allows:

  1. Two diarize_start sessions on the same Engine. Even with serialised feed_pcm_* calls, both sessions' process_chunk would alias the same mel_state buffer.
  2. Engine::diarize() while a SortformerStreamSession is running. The engine's own one-shot diarize path also uses pimpl_->mel_state (line ~552), so a diarize_samples() racing with a streaming session's feed_pcm_f32() triggers the same alias.

Fix: lift MelState mel_state onto SortformerStreamSession::Impl (mirrors what StreamSession::Impl does) and pass it through to engine_impl_diarize_helper via an extra param, or document the constraint as "one stream/diarize call per Engine instance at a time" and audit engine.h to make that explicit alongside the existing single-thread-per-Engine note.

Fix #1 is preferable; it's a 4-line change and removes the constraint entirely.

int n_mels,
EncoderOutputs & out,
int max_layers = -1,
bool capture_intermediates = true);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default capture_intermediates = true is a footgun for new external callers.

Every internal caller passes false (engine.cpp transcribe, transcribe_samples_stream, diarize, prewarm; main.cpp run_once; live-mic / live-mic-attributed via Engine). The 5+ MB device→host roundtrip the comment above warns about only ships to new callers — exactly the population this header surface targets — who didn't read the doc.

The per-stage parity harnesses (test-encoder, test-tdt-encoder-parity, test-sortformer-parity, test-encoder-capture-parity) all pass capture_intermediates=true explicitly today, so flipping the default is safe.

Two equivalent fixes, pick one:

int run_encoder(ParakeetCtcModel & model,
                const float * mel, int n_mel_frames, int n_mels,
                EncoderOutputs & out,
                int max_layers = -1,
                bool capture_intermediates = false);   // flip default

or split into two functions so the choice is at the call site:

int run_encoder(...);            // production: encoder_out + CTC logits only
int run_encoder_capture(...);    // parity: + per-stage host copies

The split shape also makes it easier to grep for parity-only call sites in tests.

# define PARAKEET_API __attribute__((visibility("default")))
# endif
#else
# define PARAKEET_API

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static-build symbols are emitted with hidden visibility.

Under the default static build (PARAKEET_SHARED undefined), PARAKEET_API expands to nothing. The parakeet target is compiled with CXX_VISIBILITY_PRESET hidden + VISIBILITY_INLINES_HIDDEN ON (CMakeLists.txt line 302-303), so symbols compiled into libparakeet.a carry STV_HIDDEN.

That's fine for static lib → executable. Breaks for our QVAC use case: a Node addon (or any host shared library) that links libparakeet.a and tries to re-export the API surface to JavaScript via its own __attribute__((visibility("default"))) wrappers. The inner parakeet::Engine::transcribe symbol is already marked hidden in the .o, so the addon's wrapper compiles and links but dlsym / N-API can't resolve the indirect call back into the static lib's hidden symbols on some toolchains.

Two fixes, pick one:

  1. Make PARAKEET_API always visible on ELF/Mach-O even in static builds — change the #else branch to:

    #else
    #  if defined(__GNUC__) || defined(__clang__)
    #    define PARAKEET_API __attribute__((visibility("default")))
    #  else
    #    define PARAKEET_API
    #  endif
    #endif

    No effect when the static lib lands in an executable (visibility is irrelevant for the final link); makes the symbols re-exportable from a wrapping shared object.

  2. Document in this header (and in README.md §1) that consumers wrapping libparakeet.a in their own .so / .dylib must compile parakeet with -DPARAKEET_SHARED -DPARAKEET_BUILD.

Option 1 is the right answer for the QVAC speech stack — every addon that consumes this should not need to know the internal visibility convention.

// happen to land at 80 ms (16 kHz x hop=160 x sub=8) but new GGUFs may
// differ -- e.g. a 24 kHz checkpoint or a 4x subsampling variant.
inline double encoder_frame_stride_ms(const ParakeetCtcModel & model) {
const int hop = model.mel_cfg.hop_length;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Divide-by-zero waiting to happen when a future GGUF lacks hop_length.

sub and sr below are > 0 ? : default-guarded; hop is not. If a future GGUF is converted with parakeet.preproc.hop_length missing or 0, this returns 0.0, then downstream:

  • transcribe_samples_stream: frames_per_window = floor(chunk_ms / 0) → UB / inf
  • StreamSession::process_window: frame_samples = round(sr * 0 / 1000) = 0, then left_drop_frames = center_start_sample / 0 → SIGFPE

One-line fix matching the other two fields:

const int hop = model.mel_cfg.hop_length > 0 ? model.mel_cfg.hop_length : 160;

No current GGUF triggers this; the converter writes parakeet.preproc.hop_length=160 for every shipped checkpoint. Catching it here means a future converter / model variant fails the load with a clean error instead of crashing inside the streaming math.

@GustavoA1604 GustavoA1604 merged commit a6785de into tetherto:master May 5, 2026
58 of 66 checks passed
GustavoA1604 pushed a commit that referenced this pull request May 21, 2026
Android app packaging keeps native libraries compressed inside the APK
with no on-disk directory to scan (AGP's `useLegacyPackaging=false`
default since 3.6). The directory-iterator pass in
`ggml_backend_load_best` therefore finds nothing on Android and the
existing per-search_path `fs::exists` filename fallback also returns
false, leaving the loader to return nullptr and the consumer to fail
`init_cpu_backend()`.

For backends that ship as a single library (Vulkan / OpenCL / ...)
the bare `lib<prefix>ggml-<name>.so` filename is enough to resolve
via Android's in-APK linker lookup, but with
`GGML_CPU_ALL_VARIANTS=ON` (the qvac-registry-vcpkg whisper-cpp port
default for Android per QVAC-18993) the CPU backend ships only as
per-arch variants -- there is no plain `libggml-cpu.so` for the
fallback to compose, so the CPU backend silently never registers.

Enumerate the known per-arch Android variants as additional candidate
names for the "cpu" backend and run each through the standard
`ggml_backend_score` selection so the device's HWCAP picks the right
tier (armv8.0 baseline through armv9.2_2; matches the variants list
emitted by `ggml_add_cpu_backend_variant()` in ggml/src/CMakeLists.txt
around lines 410-416).

Fast-path for the size-1 candidate case (every backend on every
non-Android platform, plus Vulkan / OpenCL / Metal / ... on Android):
single load_backend call, identical cost to the previous code path.
The score-then-reload loop only runs when there's an actual choice
to make.

Mirrors qvac-ext-ggml@speech commit 9562ed04 ("ggml-backend: android
per-arch CPU variant dlopen fallback", @GustavoA1604, PR #11). Carried
here as a separate commit on top of the v1.8.4.3 upstream-sync branch
so the whisper-cpp vcpkg port can ship Android dynamic-backend mode
without a port-level patch (`patches/0002-...`).

Validated by an NDK r29 cross-compile of bundled ggml + whisper.cpp
with -DGGML_BACKEND_DL=ON -DBUILD_SHARED_LIBS=OFF
-DGGML_CPU_ALL_VARIANTS=ON -DGGML_CPU_REPACK=ON:
  - all 7 per-arch libggml-cpu-android_armv*_*.so produced clean;
  - `strings ggml-backend-reg.cpp.o | grep cpu-android_armv`
    confirms the __ANDROID__ block compiles into the dispatcher
    object.

Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor pushed a commit that referenced this pull request May 28, 2026
…review)

Address @gianni-cor review on PR #11: switch the bundled ggml filename
prefix from `libparakeet-ggml-*` to `libspeech-ggml-*` so the QVAC speech
stack (whisper, parakeet, chatterbox, supertonic, ...) can co-vendor a
single ggml file set instead of each library shipping its own copy.

  - parakeet-cpp/CMakeLists.txt: OUTPUT_NAME prefix `parakeet-` -> `speech-`,
    GGML_BACKEND_DL_PROJECT_PREFIX macro `"parakeet-"` -> `"speech-"`,
    option blurb + status message updated.
  - parakeet-cpp/README.md, patches/README.md, scripts/setup-ggml.sh,
    patches/ggml-backend-reg-filename-prefix.patch: doc / comment / example
    updated to reference the new `speech-` prefix.

Verified: setup-ggml.sh re-applies all patches cleanly; CMake configure
prints `bundled ggml libraries will be emitted as libspeech-ggml-*`;
build emits libspeech-ggml{,-base,-cpu,-blas,-metal}.{0,0.9.11}.dylib;
parakeet binary's otool -L now references `libspeech-ggml*` exclusively.

Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor pushed a commit that referenced this pull request May 28, 2026
Add parakeet-cpp: NVIDIA Parakeet ASR + Sortformer diarization in pure C++/ggml
gianni-cor pushed a commit that referenced this pull request May 28, 2026
Android app packaging keeps native libraries compressed inside the APK
with no on-disk directory to scan (AGP's `useLegacyPackaging=false`
default since 3.6). The directory-iterator pass in
`ggml_backend_load_best` therefore finds nothing on Android and the
existing per-search_path `fs::exists` filename fallback also returns
false, leaving the loader to return nullptr and the consumer to fail
`init_cpu_backend()`.

For backends that ship as a single library (Vulkan / OpenCL / ...)
the bare `lib<prefix>ggml-<name>.so` filename is enough to resolve
via Android's in-APK linker lookup, but with
`GGML_CPU_ALL_VARIANTS=ON` (the qvac-registry-vcpkg whisper-cpp port
default for Android per QVAC-18993) the CPU backend ships only as
per-arch variants -- there is no plain `libggml-cpu.so` for the
fallback to compose, so the CPU backend silently never registers.

Enumerate the known per-arch Android variants as additional candidate
names for the "cpu" backend and run each through the standard
`ggml_backend_score` selection so the device's HWCAP picks the right
tier (armv8.0 baseline through armv9.2_2; matches the variants list
emitted by `ggml_add_cpu_backend_variant()` in ggml/src/CMakeLists.txt
around lines 410-416).

Fast-path for the size-1 candidate case (every backend on every
non-Android platform, plus Vulkan / OpenCL / Metal / ... on Android):
single load_backend call, identical cost to the previous code path.
The score-then-reload loop only runs when there's an actual choice
to make.

Mirrors qvac-ext-ggml@speech commit 9562ed04 ("ggml-backend: android
per-arch CPU variant dlopen fallback", @GustavoA1604, PR #11). Carried
here as a separate commit on top of the v1.8.4.3 upstream-sync branch
so the whisper-cpp vcpkg port can ship Android dynamic-backend mode
without a port-level patch (`patches/0002-...`).

Validated by an NDK r29 cross-compile of bundled ggml + whisper.cpp
with -DGGML_BACKEND_DL=ON -DBUILD_SHARED_LIBS=OFF
-DGGML_CPU_ALL_VARIANTS=ON -DGGML_CPU_REPACK=ON:
  - all 7 per-arch libggml-cpu-android_armv*_*.so produced clean;
  - `strings ggml-backend-reg.cpp.o | grep cpu-android_armv`
    confirms the __ANDROID__ block compiles into the dispatcher
    object.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants