parakeet-cpp: address PR #22 AOSC v2.1 review comments by pratiknarola-t · Pull Request #24 · tetherto/qvac-ext-lib-whisper.cpp

pratiknarola-t · 2026-05-19T05:19:36Z

Resolves the review comments on the merged AOSC v2.1 PR (#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description.

src/parakeet_sortformer.cpp -- compress_speaker_cache

Early-return when spkcache_len_per_spk <= 0 (num_spks * A_sil >= spkcache_len). The downstream boost/top-K stages are mostly defended (boost_topk_scores already returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail.
Renamed streaming_update's chunk_pre_encode_lc parameter to committed_chunk_pre_encode. The call site already advances past the left context (chunk_pre_committed = ... + lc * D), so the old _lc suffix was misleading. int lc stays -- it's used inside the function to index into preds_full, which still contains the left-context preds.
Replaced the magic -1.0e30f / +1.0e30f sentinels (4 sites) with named constants k_score_neg_inf / k_score_pos_inf backed by std::numeric_limits<float>::{lowest,max}(). Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels.

src/parakeet_engine.cpp

On the AOSC path, skip the for (cur_full) remap_id(...) loop and the prev_chunk_full_segments = std::move(cur_full) store: compute_slot_remap_ is never consulted when cache_active is true (AOSC anchors slot identity through the speaker cache), so the work was dead.
Switched v2.1 detection from pure-shape to "prefer the converter's parakeet.model_variant GGUF tag; fall back to (n_layers == 17, n_mels == 128) for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC.

include/parakeet/diarization.h

Moved the v1-vs-v2.1 detection rationale comment out of parakeet_engine.cpp and into the SortformerStreamingOptions:: spkcache_enable block, with a paragraph on the tag-first / shape-fallback policy.

src/parakeet_ctc.{h,cpp}

Added std::string ParakeetCtcModel::model_variant (optional GGUF metadata mirror; empty on legacy GGUFs).
Loader reads parakeet.model_variant next to the existing parakeet.model.type read; absent key -> empty string -> detection falls back to shape.

scripts/convert-nemo-to-gguf.py

New detect_sortformer_variant(ckpt: Path) derives a stable variant tag from the source .nemo filename (sortformer-v1 / sortformer-streaming-v2 / sortformer-streaming-v2.1-aosc); empty string for unknown checkpoints.
Sortformer branch of write_gguf writes parakeet.model_variant when the tag is non-empty.
write_gguf signature extended with ckpt: Path; only the one internal call site adjusted.

scripts/download-all-models.sh

Added the diar_streaming_sortformer_4spk-v2.1 fetch block (the AOSC fine-tune that this PR's tests target); bumped the budget comment from "~14 GiB" to "~14.5 GiB" and listed v2.1 in the contents line.

CMakeLists.txt + test/test_sortformer_streaming.cpp

Streaming ctest now consumes ${_qvp_sfsv21_q8_gguf} (was ${_qvp_sfs_q8_gguf}, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box".

test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp

Extracted the two 40-line load_wav_pcm16le_mono / file_exists duplicates into a shared inline header in the parakeet_test namespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests #include "test_utils.h" and use using parakeet_test::....

Build + ctest verification

cmake --build build -j clean (no new warnings).
ctest -R 'test-sortformer-(streaming |aosc-speakers)':
test-sortformer-streaming ........ Passed 8.23 s
test-sortformer-aosc-speakers-abcba . Passed 33.80 s
test-sortformer-aosc-speakers-abcdba Passed 36.91 s
The locally-symlinked v2.1 GGUF predates the parakeet.model_variant
key, so the AOSC tests passing here also verifies the shape-fallback
path. Re-running the converter on the v2.1 .nemo will populate
the new key for the strict-tag path.

Reviewer comments deferred / skipped (rationale):

Encoder graph cache thrashing during FIFO ramp-up (number 4): perf only; proper fix wants pre-build-at-diarize_start + silence padding or a mask argument, not minimal. Tracked for a follow-up perf PR.
ring.erase O(n) under AOSC's aggressive trim (number 10): pre-existing on the v1 path; wants a std::deque refactor, out of scope.
encoder_ms attribution surprising (number 12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".

Resolves the review comments on the merged AOSC v2.1 PR (tetherto#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description. src/parakeet_sortformer.cpp -- `compress_speaker_cache` - Early-return when `spkcache_len_per_spk <= 0` (`num_spks * A_sil >= spkcache_len`). The downstream boost/top-K stages are mostly defended (`boost_topk_scores` already returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail. - Renamed `streaming_update`'s `chunk_pre_encode_lc` parameter to `committed_chunk_pre_encode`. The call site already advances past the left context (`chunk_pre_committed = ... + lc * D`), so the old `_lc` suffix was misleading. `int lc` stays -- it's used inside the function to index into `preds_full`, which still contains the left-context preds. - Replaced the magic `-1.0e30f` / `+1.0e30f` sentinels (4 sites) with named constants `k_score_neg_inf` / `k_score_pos_inf` backed by `std::numeric_limits<float>::{lowest,max}()`. Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels. src/parakeet_engine.cpp - On the AOSC path, skip the `for (cur_full) remap_id(...)` loop and the `prev_chunk_full_segments = std::move(cur_full)` store: `compute_slot_remap_` is never consulted when `cache_active` is true (AOSC anchors slot identity through the speaker cache), so the work was dead. - Switched v2.1 detection from pure-shape to "prefer the converter's `parakeet.model_variant` GGUF tag; fall back to `(n_layers == 17, n_mels == 128)` for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC. include/parakeet/diarization.h - Moved the v1-vs-v2.1 detection rationale comment out of parakeet_engine.cpp and into the `SortformerStreamingOptions:: spkcache_enable` block, with a paragraph on the tag-first / shape-fallback policy. src/parakeet_ctc.{h,cpp} - Added `std::string ParakeetCtcModel::model_variant` (optional GGUF metadata mirror; empty on legacy GGUFs). - Loader reads `parakeet.model_variant` next to the existing `parakeet.model.type` read; absent key -> empty string -> detection falls back to shape. scripts/convert-nemo-to-gguf.py - New `detect_sortformer_variant(ckpt: Path)` derives a stable variant tag from the source .nemo filename (`sortformer-v1` / `sortformer-streaming-v2` / `sortformer-streaming-v2.1-aosc`); empty string for unknown checkpoints. - Sortformer branch of `write_gguf` writes `parakeet.model_variant` when the tag is non-empty. - `write_gguf` signature extended with `ckpt: Path`; only the one internal call site adjusted. scripts/download-all-models.sh - Added the diar_streaming_sortformer_4spk-v2.1 fetch block (the AOSC fine-tune that this PR's tests target); bumped the budget comment from "~14 GiB" to "~14.5 GiB" and listed v2.1 in the contents line. CMakeLists.txt + test/test_sortformer_streaming.cpp - Streaming ctest now consumes `${_qvp_sfsv21_q8_gguf}` (was `${_qvp_sfs_q8_gguf}`, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box". test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp - Extracted the two 40-line `load_wav_pcm16le_mono` / `file_exists` duplicates into a shared inline header in the `parakeet_test` namespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests `#include "test_utils.h"` and use `using parakeet_test::...`. Build + ctest verification - `cmake --build build -j` clean (no new warnings). - `ctest -R 'test-sortformer-(streaming |aosc-speakers)'`: test-sortformer-streaming ........ Passed 8.23 s test-sortformer-aosc-speakers-abcba . Passed 33.80 s test-sortformer-aosc-speakers-abcdba Passed 36.91 s The locally-symlinked v2.1 GGUF predates the `parakeet.model_variant` key, so the AOSC tests passing here also verifies the shape-fallback path. Re-running the converter on the v2.1 .nemo will populate the new key for the strict-tag path. Reviewer comments deferred / skipped (rationale): - Encoder graph cache thrashing during FIFO ramp-up (tetherto#4): perf only; proper fix wants pre-build-at-diarize_start + silence padding or a mask argument, not minimal. Tracked for a follow-up perf PR. - WAV fixtures committed as ~11 MB binaries (tetherto#8): project-wide Git LFS adoption decision, not a code change. - `ring.erase` O(n) under AOSC's aggressive trim (tetherto#10): pre-existing on the v1 path; wants a std::deque refactor, out of scope. - `encoder_ms` attribution surprising (tetherto#12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".

pratiknarola-t · 2026-05-19T06:13:52Z

The Encoder graph caching with Padding logic is causing performance regression as the extra computation overtakes the existing compute cost.

Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (08df2e70b8b71f8225af6ae35d3576eccea5ae7f), which folds in two PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 4f9b873. Co-authored-by: Cursor <cursoragent@cursor.com>

Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>

…ew-comments parakeet-cpp: address PR #22 AOSC v2.1 review comments

pratiknarola-t requested review from a team as code owners May 19, 2026 05:19

GustavoA1604 approved these changes May 19, 2026

View reviewed changes

GustavoA1604 merged commit 08df2e7 into tetherto:master May 19, 2026

gianni-cor pushed a commit that referenced this pull request May 28, 2026

Merge pull request #24 from pratiknarola-t/fix-parakeet-cpp-aosc-revi…

15cd2a7

…ew-comments parakeet-cpp: address PR #22 AOSC v2.1 review comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parakeet-cpp: address PR #22 AOSC v2.1 review comments#24

parakeet-cpp: address PR #22 AOSC v2.1 review comments#24
GustavoA1604 merged 1 commit into
tetherto:masterfrom
pratiknarola-t:fix-parakeet-cpp-aosc-review-comments

pratiknarola-t commented May 19, 2026 •

edited

Loading

Uh oh!

pratiknarola-t commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pratiknarola-t commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratiknarola-t commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pratiknarola-t commented May 19, 2026 •

edited

Loading