parakeet-cpp: address PR #22 AOSC v2.1 review comments#24
Merged
GustavoA1604 merged 1 commit intoMay 19, 2026
Merged
Conversation
Resolves the review comments on the merged AOSC v2.1 PR (tetherto#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description. src/parakeet_sortformer.cpp -- `compress_speaker_cache` - Early-return when `spkcache_len_per_spk <= 0` (`num_spks * A_sil >= spkcache_len`). The downstream boost/top-K stages are mostly defended (`boost_topk_scores` already returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail. - Renamed `streaming_update`'s `chunk_pre_encode_lc` parameter to `committed_chunk_pre_encode`. The call site already advances past the left context (`chunk_pre_committed = ... + lc * D`), so the old `_lc` suffix was misleading. `int lc` stays -- it's used inside the function to index into `preds_full`, which still contains the left-context preds. - Replaced the magic `-1.0e30f` / `+1.0e30f` sentinels (4 sites) with named constants `k_score_neg_inf` / `k_score_pos_inf` backed by `std::numeric_limits<float>::{lowest,max}()`. Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels. src/parakeet_engine.cpp - On the AOSC path, skip the `for (cur_full) remap_id(...)` loop and the `prev_chunk_full_segments = std::move(cur_full)` store: `compute_slot_remap_` is never consulted when `cache_active` is true (AOSC anchors slot identity through the speaker cache), so the work was dead. - Switched v2.1 detection from pure-shape to "prefer the converter's `parakeet.model_variant` GGUF tag; fall back to `(n_layers == 17, n_mels == 128)` for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC. include/parakeet/diarization.h - Moved the v1-vs-v2.1 detection rationale comment out of parakeet_engine.cpp and into the `SortformerStreamingOptions:: spkcache_enable` block, with a paragraph on the tag-first / shape-fallback policy. src/parakeet_ctc.{h,cpp} - Added `std::string ParakeetCtcModel::model_variant` (optional GGUF metadata mirror; empty on legacy GGUFs). - Loader reads `parakeet.model_variant` next to the existing `parakeet.model.type` read; absent key -> empty string -> detection falls back to shape. scripts/convert-nemo-to-gguf.py - New `detect_sortformer_variant(ckpt: Path)` derives a stable variant tag from the source .nemo filename (`sortformer-v1` / `sortformer-streaming-v2` / `sortformer-streaming-v2.1-aosc`); empty string for unknown checkpoints. - Sortformer branch of `write_gguf` writes `parakeet.model_variant` when the tag is non-empty. - `write_gguf` signature extended with `ckpt: Path`; only the one internal call site adjusted. scripts/download-all-models.sh - Added the diar_streaming_sortformer_4spk-v2.1 fetch block (the AOSC fine-tune that this PR's tests target); bumped the budget comment from "~14 GiB" to "~14.5 GiB" and listed v2.1 in the contents line. CMakeLists.txt + test/test_sortformer_streaming.cpp - Streaming ctest now consumes `${_qvp_sfsv21_q8_gguf}` (was `${_qvp_sfs_q8_gguf}`, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box". test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp - Extracted the two 40-line `load_wav_pcm16le_mono` / `file_exists` duplicates into a shared inline header in the `parakeet_test` namespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests `#include "test_utils.h"` and use `using parakeet_test::...`. Build + ctest verification - `cmake --build build -j` clean (no new warnings). - `ctest -R 'test-sortformer-(streaming |aosc-speakers)'`: test-sortformer-streaming ........ Passed 8.23 s test-sortformer-aosc-speakers-abcba . Passed 33.80 s test-sortformer-aosc-speakers-abcdba Passed 36.91 s The locally-symlinked v2.1 GGUF predates the `parakeet.model_variant` key, so the AOSC tests passing here also verifies the shape-fallback path. Re-running the converter on the v2.1 .nemo will populate the new key for the strict-tag path. Reviewer comments deferred / skipped (rationale): - Encoder graph cache thrashing during FIFO ramp-up (tetherto#4): perf only; proper fix wants pre-build-at-diarize_start + silence padding or a mask argument, not minimal. Tracked for a follow-up perf PR. - WAV fixtures committed as ~11 MB binaries (tetherto#8): project-wide Git LFS adoption decision, not a code change. - `ring.erase` O(n) under AOSC's aggressive trim (tetherto#10): pre-existing on the v1 path; wants a std::deque refactor, out of scope. - `encoder_ms` attribution surprising (tetherto#12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".
Author
GustavoA1604
approved these changes
May 19, 2026
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (08df2e70b8b71f8225af6ae35d3576eccea5ae7f), which folds in two PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 4f9b873. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
GustavoA1604
added a commit
to GustavoA1604/qvac-registry-vcpkg
that referenced
this pull request
May 19, 2026
Repoints the port at the latest tetherto/qvac-ext-lib-whisper.cpp@master tip (ef0f2ae637dc3be8bcd52b17374f9bb804beb06b), which folds in three PRs: * tetherto/qvac-ext-lib-whisper.cpp#23 -- parakeet-cpp: android dynamic backend loading + Adreno-tier GPU policy. The parakeet-cpp subtree now defaults Android builds to GGML_BACKEND_DL=ON + GGML_CPU_ALL_VARIANTS=ON + GGML_CPU_REPACK=ON + GGML_VULKAN=ON + GGML_OPENCL=ON, matching the qvac llm-llamacpp Android port. Vulkan and OpenCL ship as separately-loadable MODULE .so files; per-arch CPU variants ship as `libqvac-speech-ggml-cpu-android_armv*_*.so`. Backend selection is centralised in `init_gpu_backend()`: Adreno 700+ -> OpenCL, every other GPU -> Vulkan (or Metal / CUDA on matching platforms). No static GPU backend entry points are linked anywhere in libparakeet; the ggml-backend registry walk handles every case in both GGML_BACKEND_DL=ON and GGML_BACKEND_DL=OFF modes. Also adds public `set_backends_directory()` / `set_opencl_cache_dir()` entry points plus the matching `EngineOptions::backends_dir` / `opencl_cache_dir` fields and the `--backends-dir` CLI flag so embedded host apps can pin the backends scan directory and the ggml-opencl program-binary cache per-process. * tetherto/qvac-ext-lib-whisper.cpp#24 -- parakeet-cpp: address PR #22 AOSC v2.1 review comments (Sortformer streaming fixes that landed shortly after PR #23 merged; safe to fold in). * tetherto/qvac-ext-lib-whisper.cpp#25 -- Fix missing include for windows (compile-only follow-up to PR #23; needed for the Windows desktop dev path that exercises the new init_gpu_backend tier policy). Date-stamped rather than port-versioned because the upstream commits land Android-specific backend-loading machinery that previous pv1 builds genuinely lacked (not just a bugfix on the same source set). Consumers pinning to `2026-05-05#1` keep the StreamingSegment .starts_word baseline; consumers tracking the date-stamped baseline move forward to the dynamic-backend Android shape. Dependency floor on ggml-speech tightened from `2026-04-09#1` to `2026-04-09#2` -- the new Android CPU_ALL_VARIANTS path requires the per-arch CPU variant dlopen fallback that landed in ggml-speech pv2 (previous commit). Without that floor a downstream registry override could silently pull pv1 and fail to register any CPU backend at runtime under AGP's `useLegacyPackaging=false` (the universal Android default since 3.6). No behaviour change on macOS / iOS (Metal still statically linked into libggml-*) or desktop Linux / Windows (Vulkan / CUDA likewise static). The Android-defaults block in parakeet-cpp's CMakeLists.txt is gated on `CMAKE_SYSTEM_NAME STREQUAL "Android"` and only flips the dynamic-loading switches there. Verified by host build: `nm libparakeet.dylib | grep ggml_backend_(vulkan|opencl|metal|cuda|blas)_init` returns empty. git-tree for ports/parakeet-cpp: 2961794. Co-authored-by: Cursor <cursoragent@cursor.com>
gianni-cor
pushed a commit
that referenced
this pull request
May 28, 2026
…ew-comments parakeet-cpp: address PR #22 AOSC v2.1 review comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Resolves the review comments on the merged AOSC v2.1 PR (#22, merge commit e6ba38c). All eight changes are minimal and behaviour-preserving except the v2.1 detection upgrade (now strict-tag with shape fallback) and the degenerate-config guard (silence-only fallback instead of UB-adjacent boost arithmetic). Reviewer comments classified as "perf only / out of scope / would only add a TODO" are intentionally not addressed in this commit -- see the plan file referenced in the PR description.
src/parakeet_sortformer.cpp --
compress_speaker_cachespkcache_len_per_spk <= 0(num_spks * A_sil >= spkcache_len). The downstream boost/top-K stages are mostly defended (boost_topk_scoresalready returns early on non-positive k), but the function was otherwise running a no-op pass that produced an all-silence cache via the slow path. Fall back to an explicit silence-only profile and bail.streaming_update'schunk_pre_encode_lcparameter tocommitted_chunk_pre_encode. The call site already advances past the left context (chunk_pre_committed = ... + lc * D), so the old_lcsuffix was misleading.int lcstays -- it's used inside the function to index intopreds_full, which still contains the left-context preds.-1.0e30f/+1.0e30fsentinels (4 sites) with named constantsk_score_neg_inf/k_score_pos_infbacked bystd::numeric_limits<float>::{lowest,max}(). Dropped the inline "-inf is UB with current FP flags" comments: IEEE 754 +/-inf is well-defined; the original concern (avoiding NaN-on-arithmetic) still holds because we only store and compare the sentinels.src/parakeet_engine.cpp
for (cur_full) remap_id(...)loop and theprev_chunk_full_segments = std::move(cur_full)store:compute_slot_remap_is never consulted whencache_activeis true (AOSC anchors slot identity through the speaker cache), so the work was dead.parakeet.model_variantGGUF tag; fall back to(n_layers == 17, n_mels == 128)for legacy GGUFs". This prevents a future v2.2/v3 variant that happens to share v2.1's encoder shape from silently opting into AOSC.include/parakeet/diarization.h
SortformerStreamingOptions:: spkcache_enableblock, with a paragraph on the tag-first / shape-fallback policy.src/parakeet_ctc.{h,cpp}
std::string ParakeetCtcModel::model_variant(optional GGUF metadata mirror; empty on legacy GGUFs).parakeet.model_variantnext to the existingparakeet.model.typeread; absent key -> empty string -> detection falls back to shape.scripts/convert-nemo-to-gguf.py
detect_sortformer_variant(ckpt: Path)derives a stable variant tag from the source .nemo filename (sortformer-v1/sortformer-streaming-v2/sortformer-streaming-v2.1-aosc); empty string for unknown checkpoints.write_ggufwritesparakeet.model_variantwhen the tag is non-empty.write_ggufsignature extended withckpt: Path; only the one internal call site adjusted.scripts/download-all-models.sh
CMakeLists.txt + test/test_sortformer_streaming.cpp
${_qvp_sfsv21_q8_gguf}(was${_qvp_sfs_q8_gguf}, the v2 model). The in-binary default GGUF path is the matching v2.1 q8_0. Aligns the test with the line-299 comment that says the binary "reflects the production v2.1 AOSC config out of the box".test/test_utils.h (new) + test/test_sortformer_{streaming,aosc_speakers}.cpp
load_wav_pcm16le_mono/file_existsduplicates into a shared inline header in theparakeet_testnamespace. The duplicate copies and the "duplicated here on purpose" comment block in test_sortformer_aosc_speakers.cpp are gone; both tests#include "test_utils.h"and useusing parakeet_test::....Build + ctest verification
cmake --build build -jclean (no new warnings).ctest -R 'test-sortformer-(streaming |aosc-speakers)':test-sortformer-streaming ........ Passed 8.23 s
test-sortformer-aosc-speakers-abcba . Passed 33.80 s
test-sortformer-aosc-speakers-abcdba Passed 36.91 s
The locally-symlinked v2.1 GGUF predates the
parakeet.model_variantkey, so the AOSC tests passing here also verifies the shape-fallback
path. Re-running the converter on the v2.1 .nemo will populate
the new key for the strict-tag path.
Reviewer comments deferred / skipped (rationale):
ring.eraseO(n) under AOSC's aggressive trim (number 10): pre-existing on the v1 path; wants a std::deque refactor, out of scope.encoder_msattribution surprising (number 12): code is correct and matches sibling paths; the user explicitly opted against comment-only "clarifications".