testing qvac-lib-error-base-workflow#24
Closed
Proletter wants to merge 1 commit into
Closed
Conversation
Contributor
|
Requesting review from: @ignaciolarranaga [auto_pr_review_request] |
3 similar comments
Contributor
|
Requesting review from: @ignaciolarranaga [auto_pr_review_request] |
Contributor
|
Requesting review from: @ignaciolarranaga [auto_pr_review_request] |
Contributor
|
Requesting review from: @ignaciolarranaga [auto_pr_review_request] |
pratiknarola-t
pushed a commit
that referenced
this pull request
May 20, 2026
Consumes parakeet-cpp 2026-05-20 (the upstream bump in qvac-registry-vcpkg PR #156, which lands PR #22 + PR #24 from qvac-ext-lib-whisper.cpp): v2.1 streaming Sortformer GGUF with Audio-Online Speaker Cache (AOSC) -- a NeMo-port speaker cache that anchors speaker-slot identity across silence and re-entry, fixing the per-chunk drift v1 exhibits in continuous live capture. Defaults: v2.1 is the new streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. What changed: - vcpkg dep: bump parakeet-cpp version>= to 2026-05-20 across all three platform branches in packages/transcription-parakeet/vcpkg.json. - C++ addon (transcription-parakeet/addon/src/): - ParakeetConfig.hpp: add 6 streamingSpkCache* / streamingFifo* / streamingChunk{Left,Right}ContextMs / streamingSpkCacheUpdatePeriod fields with NeMo-port defaults (188/188/80/560/144, spkCacheEnable=true). Comments document the v2.1-only applicability and the auto-detection via the GGUF metadata tag parakeet.model_variant. - ParakeetModel.{hpp,cpp}: 6 read-only accessors; forward the fields into parakeet::SortformerStreamingOptions inside the SORTFORMER branch of the in-process streaming session (Mode 3). - ParakeetStreamingProcessor.{hpp,cpp}: mirror the 6 fields on the duplex Config struct; forward into SortformerStreamingOptions for the runStreaming() session. - AddonJs.hpp::startStreaming: source defaults from the model's getters; accept per-call overrides (spkCacheEnable, spkCacheLen, fifoLen, chunkLeftContextMs, chunkRightContextMs, spkCacheUpdatePeriod) on the runStreaming() config object. - JSAdapter.cpp::loadFromJSObject: read the 6 new keys at createInstance time so constructor-supplied parakeetConfig overrides reach C++. - JS/TS API (transcription-parakeet/): - index.d.ts: declare the 6 fields on ParakeetConfig and matching overrides on StreamingRunConfig. - parakeet.js: extend the JSDoc on the constructor and startStreaming() with the new params (config is passed opaquely to native; no logic change here). - index.js: forward the 6 fields through _buildConfigurationParams() so they actually reach createInstance. (Without this, the JSDoc + native plumbing exist but the values never leave the JS layer -- surfaced during local verification when streamingSpkCacheEnable=false initially didn't disable AOSC on the v2.1 GGUF.) - SDK (packages/sdk/): - schemas/transcription-config.ts: extend parakeetRuntimeConfigSchema with all 7 streaming knobs (streaming, streamingChunkMs, streamingHistoryMs, streamingEmitPartials, streamingEnergyVad, streamingLeftContextMs, streamingRightLookaheadMs) plus the 6 AOSC knobs. The streaming knobs were never exposed before -- AOSC is unreachable without them. - server/bare/plugins/parakeet-transcription/plugin.ts: createParakeetModel forwards all 13 new fields into addonConfig.parakeetConfig. - examples/transcription/parakeet-sortformer-streaming.ts (new): high-level SDK example for v2.1 + AOSC streaming. - Model registry: - packages/registry-server/data/models.prod.json: new "Parakeet Streaming Sortformer 4SPK v2.1 GGUF (AOSC)" entry (placeholder s3 path; replace with real upload date once the GGUF lands). Updated v1 entry's notes to clarify its offline- default role. - (sdk/models/registry/models.ts is auto-generated and will pick up the new entry via models/update-models after the GGUF is uploaded; not touched here.) - Tests (transcription-parakeet/test/integration/): - helpers.js: add sortformerStreaming MODEL_CONFIGS entry pointing at diar_streaming_sortformer_4spk-v2.1.q8_0.gguf. - sortformer-aosc-streaming.test.js (new): covers default-AOSC streaming + streamingSpkCacheEnable=false fallback to the v1 sliding-window path. The full AOSC slot-stability contract is verified at C++ level in parakeet-cpp/test/test_sortformer_aosc_speakers.cpp; this JS-level test focuses on wiring correctness. - Examples (transcription-parakeet/examples/): - live-mic-diarized-aosc.js (new): v2.1-focused dual-stream live mic example with full CLI control of the AOSC knobs. - live-mic-diarized.js / diarized-transcribe.js: header notes recommending v2.1 for streaming, v1 for offline. - Docs: - README.md: extended Model Variants table with v1 (offline-default) and v2.1 (streaming-default) rows; new streamingSpkCache* rows in the ParakeetConfig table; dedicated "Sortformer Streaming Diarization (v2.1 + AOSC)" paragraph; updated example commands to point at the v2.1 GGUF. Verification done locally against a vcpkg overlay pointing at the PR #156 branch: addon compiles with the new parakeet-cpp; full integration suite passes 37/37 (72/72 asserts) with all q8_0 GGUFs staged, including both new AOSC test cases. Depends on: - qvac-registry-vcpkg PR #156 (parakeet-cpp 2026-05-20 bump). CI will not be able to resolve the new version>= constraint until that PR merges. - Upload of diar_streaming_sortformer_4spk-v2.1.q8_0.gguf to S3 (replace the placeholder source path in models.prod.json once the upload date is known, then re-run models/update-models to sync sdk/models/registry/models.ts).
pratiknarola-t
pushed a commit
that referenced
this pull request
May 20, 2026
## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS / SDK layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Surface the 6 AOSC knobs plus the 7 pre-existing streaming knobs on the SDK schema + plugin so SDK consumers can configure AOSC without dropping to `@qvac/transcription-parakeet` directly. (Streaming was never exposed on the SDK schema before -- AOSC is unreachable without `streaming: true`.) - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`, and as flat fields on the SDK `parakeetRuntimeConfigSchema`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines. ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` Same fields available via the SDK plugin: ```typescript import { loadModel } from "@qvac/sdk"; const modelId = await loadModel({ modelSrc: "<v2.1-sortformer-src>", modelType: "parakeet", modelConfig: { modelType: "sortformer", parakeetSortformerSrc: "<v2.1-sortformer-src>", streaming: true, streamingChunkMs: 2000, streamingSpkCacheEnable: true, streamingChunkRightContextMs: 560, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF model entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).
pratiknarola-t
pushed a commit
that referenced
this pull request
May 20, 2026
Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive API only; no breaking changes). ## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`, `streamingFifoLen`, `streamingChunkLeftContextMs`, `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines (no-op forwarding is safe). ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`). ## Follow-up (separate PR, not in scope here) SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a separate PR after this addon is published and the v2.1 GGUF entry has synced into `sdk/models/registry/models.ts`. The SDK needs both pieces in place before its schema can meaningfully forward AOSC knobs.
pratiknarola-t
pushed a commit
that referenced
this pull request
May 20, 2026
…ased]
Two reviewer follow-ups on the v2.1 + AOSC PR:
1. `npm run setup-models` now fetches + converts v2.1 sortformer.
- download-models.sh: new `sortformer-streaming-v2.1` type pulling
from
https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
- convert-nemo.sh: matching type maps .nemo ->
`diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
- `--type all` (default) now includes the new type, so
`npm run setup-models` stages v2.1 alongside the other models.
- convert-nemo-to-gguf.py: surgically picked up PR #24's variant
emission (the `detect_sortformer_variant(ckpt)` helper +
`writer.add_string("parakeet.model_variant", ...)` call) without
touching local qvac divergences (vendored attribution header,
descriptive docstrings, `--quant f16` default, and the
huggingface_hub import-error helper). The C++ engine's strict
v2.1 detection now matches on `parakeet.model_variant ==
"sortformer-streaming-v2.1-aosc"` instead of falling back to
the encoder-shape heuristic.
- Verified end-to-end locally: `bash scripts/convert-nemo.sh
--type sortformer-streaming-v2.1 --quant q8_0 --force` produces
models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
resulting GGUF carries `parakeet.model_variant =
"sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).
2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
package.json + vcpkg.json reverted to 0.4.0. The release PR will
promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.
GustavoA1604
added a commit
that referenced
this pull request
May 20, 2026
…2137) * feat[api]: add Sortformer v2.1 + AOSC streaming diarization support Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive API only; no breaking changes). ## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`, `streamingFifoLen`, `streamingChunkLeftContextMs`, `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines (no-op forwarding is safe). ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`). ## Follow-up (separate PR, not in scope here) SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a separate PR after this addon is published and the v2.1 GGUF entry has synced into `sdk/models/registry/models.ts`. The SDK needs both pieces in place before its schema can meaningfully forward AOSC knobs. * chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased] Two reviewer follow-ups on the v2.1 + AOSC PR: 1. `npm run setup-models` now fetches + converts v2.1 sortformer. - download-models.sh: new `sortformer-streaming-v2.1` type pulling from https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo - convert-nemo.sh: matching type maps .nemo -> `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`. - `--type all` (default) now includes the new type, so `npm run setup-models` stages v2.1 alongside the other models. - convert-nemo-to-gguf.py: surgically picked up PR #24's variant emission (the `detect_sortformer_variant(ckpt)` helper + `writer.add_string("parakeet.model_variant", ...)` call) without touching local qvac divergences (vendored attribution header, descriptive docstrings, `--quant f16` default, and the huggingface_hub import-error helper). The C++ engine's strict v2.1 detection now matches on `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"` instead of falling back to the encoder-shape heuristic. - Verified end-to-end locally: `bash scripts/convert-nemo.sh --type sortformer-streaming-v2.1 --quant q8_0 --force` produces models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the resulting GGUF carries `parakeet.model_variant = "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader). 2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in package.json + vcpkg.json reverted to 0.4.0. The release PR will promote `[Unreleased]` -> `[0.5.0]` and bump the versions then. * fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree The registry's parakeet-cpp.json lists both 2026-05-20#0 and 2026-05-20#1 (PR #156 introduced both port-versions in its two commits before squash-merging). vcpkg's minimum-version-selection picks #0 when the manifest says `version>=: 2026-05-20`, but the #0 git-tree is orphaned by the squash merge -- unreachable from main, so `git fetch HEAD` doesn't pull it in. CI fails with: fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127 note: while loading parakeet-cpp@2026-05-20 Pinning `version>=: 2026-05-20#1` skips the orphan and resolves to the actual port content on main (tree 69619b43...). Matches the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same file. Local clean build (no overlay, no cached registry) succeeds. * cpp lint format * Bump version --------- Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com> Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…2137) * feat[api]: add Sortformer v2.1 + AOSC streaming diarization support Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive API only; no breaking changes). ## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`, `streamingFifoLen`, `streamingChunkLeftContextMs`, `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines (no-op forwarding is safe). ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`). ## Follow-up (separate PR, not in scope here) SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a separate PR after this addon is published and the v2.1 GGUF entry has synced into `sdk/models/registry/models.ts`. The SDK needs both pieces in place before its schema can meaningfully forward AOSC knobs. * chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased] Two reviewer follow-ups on the v2.1 + AOSC PR: 1. `npm run setup-models` now fetches + converts v2.1 sortformer. - download-models.sh: new `sortformer-streaming-v2.1` type pulling from https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo - convert-nemo.sh: matching type maps .nemo -> `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`. - `--type all` (default) now includes the new type, so `npm run setup-models` stages v2.1 alongside the other models. - convert-nemo-to-gguf.py: surgically picked up PR #24's variant emission (the `detect_sortformer_variant(ckpt)` helper + `writer.add_string("parakeet.model_variant", ...)` call) without touching local qvac divergences (vendored attribution header, descriptive docstrings, `--quant f16` default, and the huggingface_hub import-error helper). The C++ engine's strict v2.1 detection now matches on `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"` instead of falling back to the encoder-shape heuristic. - Verified end-to-end locally: `bash scripts/convert-nemo.sh --type sortformer-streaming-v2.1 --quant q8_0 --force` produces models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the resulting GGUF carries `parakeet.model_variant = "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader). 2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in package.json + vcpkg.json reverted to 0.4.0. The release PR will promote `[Unreleased]` -> `[0.5.0]` and bump the versions then. * fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree The registry's parakeet-cpp.json lists both 2026-05-20#0 and 2026-05-20#1 (PR #156 introduced both port-versions in its two commits before squash-merging). vcpkg's minimum-version-selection picks #0 when the manifest says `version>=: 2026-05-20`, but the #0 git-tree is orphaned by the squash merge -- unreachable from main, so `git fetch HEAD` doesn't pull it in. CI fails with: fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127 note: while loading parakeet-cpp@2026-05-20 Pinning `version>=: 2026-05-20#1` skips the orphan and resolves to the actual port content on main (tree 69619b43...). Matches the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same file. Local clean build (no overlay, no cached registry) succeeds. * cpp lint format * Bump version --------- Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter> Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com> Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.