Skip to content

feat[api]: add Sortformer v2.1 + AOSC streaming diarization support#2137

Merged
GustavoA1604 merged 8 commits into
mainfrom
feat-parakeet-sortformer-v2.1-aosc
May 20, 2026
Merged

feat[api]: add Sortformer v2.1 + AOSC streaming diarization support#2137
GustavoA1604 merged 8 commits into
mainfrom
feat-parakeet-sortformer-v2.1-aosc

Conversation

@pratiknarola-t

@pratiknarola-t pratiknarola-t commented May 20, 2026

Copy link
Copy Markdown
Contributor

Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

🎯 What problem does this PR solve?

  • v1 Sortformer streaming uses a fixed-size sliding-history window; once
    a speaker goes silent long enough to roll out of the window, their
    slot identity drifts onto a different physical voice when they return.
  • Continuous single-speaker stretches collapse all voices onto
    sortformer_0 once two speakers have been seen, breaking live
    speaker-tagged transcripts.
  • v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
    parakeet-cpp, but until now there was no way to consume it from the
    JS layer.

📝 How does it solve it?

  • Bump parakeet-cpp to version>= 2026-05-20 (the qvac-registry-vcpkg
    bump in PR chore(qvac-sdk): Remove bundler and cli from sdk #156 pulls in PRs update version #22 / testing qvac-lib-error-base-workflow #24 of qvac-ext-lib-whisper.cpp).
  • Plumb 6 AOSC knobs (streamingSpkCacheEnable, streamingSpkCacheLen,
    streamingFifoLen, streamingChunkLeftContextMs,
    streamingChunkRightContextMs, streamingSpkCacheUpdatePeriod) from
    JS through ParakeetConfig -> ParakeetModel /
    ParakeetStreamingProcessor -> parakeet::SortformerStreamingOptions,
    for both the in-process Mode-3 streaming path and the duplex
    runStreaming() processor.
  • v2.1 is auto-detected by the engine via the GGUF metadata tag
    parakeet.model_variant; AOSC defaults mirror parakeet-cpp's
    NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
  • Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
    default. Both GGUFs remain registered.
  • New examples/live-mic-diarized-aosc.js exposes every AOSC knob as a
    CLI flag for A/B comparison against the v1 sliding-window path.

🧪 How was it tested?

  • Built locally against a vcpkg overlay pointing at the PR chore(qvac-sdk): Remove bundler and cli from sdk #156 branch;
    addon compiled cleanly with all 6 new AOSC field references through
    ParakeetStreamingProcessor.cpp, ParakeetModel.cpp, AddonJs.hpp,
    and JSAdapter.cpp.
  • Full integration suite: 37/37 tests pass, 72/72 assertions in 145s
    (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
  • New test/integration/sortformer-aosc-streaming.test.js covers
    default-AOSC streaming + streamingSpkCacheEnable=false fallback to
    the v1 sliding-window code path. Confirmed via engine logs that the
    override actually disables the cache (Sortformer AOSC enabled line
    only prints when AOSC is active).
  • v1 Sortformer desktop integration + GPU smoke tests still pass -- no
    regression to the existing diarization path.

🔌 API Changes

New optional fields on ParakeetConfig, mirrored as per-call overrides
on StreamingRunConfig. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});

Depends on

  • qvac-registry-vcpkg chore(qvac-sdk): Remove bundler and cli from sdk #156 (parakeet-cpp 2026-05-20 bump). CI will not
    resolve the new version>= constraint until that PR merges.
  • Separate registry-server PR for the v2.1 GGUF entry in
    models.prod.json (out of scope for this PR -- handled independently).
  • Upload of diar_streaming_sortformer_4spk-v2.1.q8_0.gguf to S3 (the
    GGUF the new test resolves via MODEL_CONFIGS.sortformerStreaming).

Follow-up (separate PR, not in scope here)

SDK adoption (@qvac/sdk schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into sdk/models/registry/models.ts. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.

@pratiknarola-t pratiknarola-t requested review from a team as code owners May 20, 2026 10:34
@pratiknarola-t pratiknarola-t force-pushed the feat-parakeet-sortformer-v2.1-aosc branch 2 times, most recently from 3e8ec73 to 36b3109 Compare May 20, 2026 10:49
@pratiknarola-t pratiknarola-t changed the title feat: add Sortformer v2.1 + AOSC streaming diarization support feat[api]: add Sortformer v2.1 + AOSC streaming diarization support May 20, 2026
@ishanvohra2

Copy link
Copy Markdown
Contributor

The SDK changes should only occur after the parakeet package is published.
You can split this into two PRs. The second PR can target SDK changes and the version update for parakeet.
@pratiknarola-t

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`,
  `streamingFifoLen`, `streamingChunkLeftContextMs`,
  `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from
  JS through `ParakeetConfig` -> `ParakeetModel` /
  `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR #156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

## Depends on

- qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).

## Follow-up (separate PR, not in scope here)

SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into `sdk/models/registry/models.ts`. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.
@pratiknarola-t pratiknarola-t force-pushed the feat-parakeet-sortformer-v2.1-aosc branch from 36b3109 to 402422c Compare May 20, 2026 11:11

@GustavoA1604 GustavoA1604 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Update setup-models flow to also fetch and convert sortformer 2.1 model (update all necessary scripts)
  2. Add new entry in CHANGELOG.md under [Unreleased] with what is being added in this PR

…ased]

Two reviewer follow-ups on the v2.1 + AOSC PR:

1. `npm run setup-models` now fetches + converts v2.1 sortformer.
   - download-models.sh: new `sortformer-streaming-v2.1` type pulling
     from
     https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
   - convert-nemo.sh: matching type maps .nemo ->
     `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
   - `--type all` (default) now includes the new type, so
     `npm run setup-models` stages v2.1 alongside the other models.
   - convert-nemo-to-gguf.py: surgically picked up PR #24's variant
     emission (the `detect_sortformer_variant(ckpt)` helper +
     `writer.add_string("parakeet.model_variant", ...)` call) without
     touching local qvac divergences (vendored attribution header,
     descriptive docstrings, `--quant f16` default, and the
     huggingface_hub import-error helper). The C++ engine's strict
     v2.1 detection now matches on `parakeet.model_variant ==
     "sortformer-streaming-v2.1-aosc"` instead of falling back to
     the encoder-shape heuristic.
   - Verified end-to-end locally: `bash scripts/convert-nemo.sh
     --type sortformer-streaming-v2.1 --quant q8_0 --force` produces
     models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
     resulting GGUF carries `parakeet.model_variant =
     "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).

2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
   package.json + vcpkg.json reverted to 0.4.0. The release PR will
   promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.
@GustavoA1604 GustavoA1604 added tier1 verified Authorize secrets / label-gate in PR workflows labels May 20, 2026
The registry's parakeet-cpp.json lists both 2026-05-20#0 and
2026-05-20#1 (PR #156 introduced both port-versions in its two
commits before squash-merging). vcpkg's minimum-version-selection
picks #0 when the manifest says `version>=: 2026-05-20`, but the
#0 git-tree is orphaned by the squash merge -- unreachable from
main, so `git fetch HEAD` doesn't pull it in. CI fails with:

  fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127
  note: while loading parakeet-cpp@2026-05-20

Pinning `version>=: 2026-05-20#1` skips the orphan and resolves
to the actual port content on main (tree 69619b43...). Matches
the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same
file.

Local clean build (no overlay, no cached registry) succeeds.
@GustavoA1604

Copy link
Copy Markdown
Contributor

Final on-pr run here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tier1 verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants