feat[api]: add Sortformer v2.1 + AOSC streaming diarization support#2137
Merged
Conversation
3e8ec73 to
36b3109
Compare
Contributor
|
The SDK changes should only occur after the parakeet package is published. |
Contributor
Tier-based Approval Status |
Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive API only; no breaking changes). ## 🎯 What problem does this PR solve? - v1 Sortformer streaming uses a fixed-size sliding-history window; once a speaker goes silent long enough to roll out of the window, their slot identity drifts onto a different physical voice when they return. - Continuous single-speaker stretches collapse all voices onto `sortformer_0` once two speakers have been seen, breaking live speaker-tagged transcripts. - v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in parakeet-cpp, but until now there was no way to consume it from the JS layer. ## 📝 How does it solve it? - Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp). - Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`, `streamingFifoLen`, `streamingChunkLeftContextMs`, `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from JS through `ParakeetConfig` -> `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`, for both the in-process Mode-3 streaming path and the duplex `runStreaming()` processor. - v2.1 is auto-detected by the engine via the GGUF metadata tag `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled). - Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline default. Both GGUFs remain registered. - New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a CLI flag for A/B comparison against the v1 sliding-window path. ## 🧪 How was it tested? - Built locally against a vcpkg overlay pointing at the PR #156 branch; addon compiled cleanly with all 6 new AOSC field references through `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`, and `JSAdapter.cpp`. - Full integration suite: **37/37 tests pass, 72/72 assertions in 145s** (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer). - New `test/integration/sortformer-aosc-streaming.test.js` covers default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to the v1 sliding-window code path. Confirmed via engine logs that the override actually disables the cache (`Sortformer AOSC enabled` line only prints when AOSC is active). - v1 Sortformer desktop integration + GPU smoke tests still pass -- no regression to the existing diarization path. ## 🔌 API Changes New optional fields on `ParakeetConfig`, mirrored as per-call overrides on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on non-Sortformer engines (no-op forwarding is safe). ```typescript import { TranscriptionParakeet } from "@qvac/transcription-parakeet"; const model = new TranscriptionParakeet({ files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" }, config: { parakeetConfig: { streaming: true, streamingChunkMs: 2000, // AOSC (v2.1+ only; auto-detected via GGUF metadata) streamingSpkCacheEnable: true, // default streamingSpkCacheLen: 188, // long-term cache rows streamingFifoLen: 188, // warmup FIFO rows streamingChunkLeftContextMs: 80, // ~1 encoder frame streamingChunkRightContextMs: 560, // ~7 encoder frames streamingSpkCacheUpdatePeriod: 144, // FIFO-overflow pop count }, }, }); ``` ## Depends on - qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not resolve the new `version>=` constraint until that PR merges. - Separate registry-server PR for the v2.1 GGUF entry in `models.prod.json` (out of scope for this PR -- handled independently). - Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`). ## Follow-up (separate PR, not in scope here) SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a separate PR after this addon is published and the v2.1 GGUF entry has synced into `sdk/models/registry/models.ts`. The SDK needs both pieces in place before its schema can meaningfully forward AOSC knobs.
36b3109 to
402422c
Compare
GustavoA1604
requested changes
May 20, 2026
GustavoA1604
left a comment
Contributor
There was a problem hiding this comment.
- Update setup-models flow to also fetch and convert sortformer 2.1 model (update all necessary scripts)
- Add new entry in CHANGELOG.md under [Unreleased] with what is being added in this PR
…ased]
Two reviewer follow-ups on the v2.1 + AOSC PR:
1. `npm run setup-models` now fetches + converts v2.1 sortformer.
- download-models.sh: new `sortformer-streaming-v2.1` type pulling
from
https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
- convert-nemo.sh: matching type maps .nemo ->
`diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
- `--type all` (default) now includes the new type, so
`npm run setup-models` stages v2.1 alongside the other models.
- convert-nemo-to-gguf.py: surgically picked up PR #24's variant
emission (the `detect_sortformer_variant(ckpt)` helper +
`writer.add_string("parakeet.model_variant", ...)` call) without
touching local qvac divergences (vendored attribution header,
descriptive docstrings, `--quant f16` default, and the
huggingface_hub import-error helper). The C++ engine's strict
v2.1 detection now matches on `parakeet.model_variant ==
"sortformer-streaming-v2.1-aosc"` instead of falling back to
the encoder-shape heuristic.
- Verified end-to-end locally: `bash scripts/convert-nemo.sh
--type sortformer-streaming-v2.1 --quant q8_0 --force` produces
models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
resulting GGUF carries `parakeet.model_variant =
"sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).
2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
package.json + vcpkg.json reverted to 0.4.0. The release PR will
promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.
The registry's parakeet-cpp.json lists both 2026-05-20#0 and 2026-05-20#1 (PR #156 introduced both port-versions in its two commits before squash-merging). vcpkg's minimum-version-selection picks #0 when the manifest says `version>=: 2026-05-20`, but the #0 git-tree is orphaned by the squash merge -- unreachable from main, so `git fetch HEAD` doesn't pull it in. CI fails with: fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127 note: while loading parakeet-cpp@2026-05-20 Pinning `version>=: 2026-05-20#1` skips the orphan and resolves to the actual port content on main (tree 69619b43...). Matches the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same file. Local clean build (no overlay, no cached registry) succeeds.
GustavoA1604
approved these changes
May 20, 2026
Contributor
|
Final on-pr run here |
gusttav-lang
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).
🎯 What problem does this PR solve?
a speaker goes silent long enough to roll out of the window, their
slot identity drifts onto a different physical voice when they return.
sortformer_0once two speakers have been seen, breaking livespeaker-tagged transcripts.
parakeet-cpp, but until now there was no way to consume it from the
JS layer.
📝 How does it solve it?
parakeet-cpptoversion>= 2026-05-20(the qvac-registry-vcpkgbump in PR chore(qvac-sdk): Remove bundler and cli from sdk #156 pulls in PRs update version #22 / testing qvac-lib-error-base-workflow #24 of qvac-ext-lib-whisper.cpp).
streamingSpkCacheEnable,streamingSpkCacheLen,streamingFifoLen,streamingChunkLeftContextMs,streamingChunkRightContextMs,streamingSpkCacheUpdatePeriod) fromJS through
ParakeetConfig->ParakeetModel/ParakeetStreamingProcessor->parakeet::SortformerStreamingOptions,for both the in-process Mode-3 streaming path and the duplex
runStreaming()processor.parakeet.model_variant; AOSC defaults mirror parakeet-cpp'sNeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
default. Both GGUFs remain registered.
examples/live-mic-diarized-aosc.jsexposes every AOSC knob as aCLI flag for A/B comparison against the v1 sliding-window path.
🧪 How was it tested?
addon compiled cleanly with all 6 new AOSC field references through
ParakeetStreamingProcessor.cpp,ParakeetModel.cpp,AddonJs.hpp,and
JSAdapter.cpp.(macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
test/integration/sortformer-aosc-streaming.test.jscoversdefault-AOSC streaming +
streamingSpkCacheEnable=falsefallback tothe v1 sliding-window code path. Confirmed via engine logs that the
override actually disables the cache (
Sortformer AOSC enabledlineonly prints when AOSC is active).
regression to the existing diarization path.
🔌 API Changes
New optional fields on
ParakeetConfig, mirrored as per-call overrideson
StreamingRunConfig. All default to parakeet-cpp's NeMo-porttuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).
Depends on
resolve the new
version>=constraint until that PR merges.models.prod.json(out of scope for this PR -- handled independently).diar_streaming_sortformer_4spk-v2.1.q8_0.ggufto S3 (theGGUF the new test resolves via
MODEL_CONFIGS.sortformerStreaming).Follow-up (separate PR, not in scope here)
SDK adoption (
@qvac/sdkschema + plugin + example) lands in aseparate PR after this addon is published and the v2.1 GGUF entry has
synced into
sdk/models/registry/models.ts. The SDK needs both piecesin place before its schema can meaningfully forward AOSC knobs.