Skip to content

testing qvac-lib-error-base-workflow#24

Closed
Proletter wants to merge 1 commit into
mainfrom
qvac-lib-error-base-integration-test1
Closed

testing qvac-lib-error-base-workflow#24
Proletter wants to merge 1 commit into
mainfrom
qvac-lib-error-base-integration-test1

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

Copy link
Copy Markdown
Contributor

Requesting review from: @ignaciolarranaga [auto_pr_review_request]

3 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

Requesting review from: @ignaciolarranaga [auto_pr_review_request]

@github-actions

Copy link
Copy Markdown
Contributor

Requesting review from: @ignaciolarranaga [auto_pr_review_request]

@github-actions

Copy link
Copy Markdown
Contributor

Requesting review from: @ignaciolarranaga [auto_pr_review_request]

@Proletter Proletter closed this Jan 21, 2026
@Proletter Proletter deleted the qvac-lib-error-base-integration-test1 branch January 21, 2026 12:02
pratiknarola-t pushed a commit that referenced this pull request May 20, 2026
Consumes parakeet-cpp 2026-05-20 (the upstream bump in
qvac-registry-vcpkg PR #156, which lands PR #22 + PR #24 from
qvac-ext-lib-whisper.cpp): v2.1 streaming Sortformer GGUF with
Audio-Online Speaker Cache (AOSC) -- a NeMo-port speaker cache
that anchors speaker-slot identity across silence and re-entry,
fixing the per-chunk drift v1 exhibits in continuous live capture.

Defaults: v2.1 is the new streaming Sortformer; v1 stays the
offline default. Both GGUFs remain registered.

What changed:

- vcpkg dep: bump parakeet-cpp version>= to 2026-05-20 across
  all three platform branches in packages/transcription-parakeet/vcpkg.json.

- C++ addon (transcription-parakeet/addon/src/):
  - ParakeetConfig.hpp: add 6 streamingSpkCache* / streamingFifo* /
    streamingChunk{Left,Right}ContextMs / streamingSpkCacheUpdatePeriod
    fields with NeMo-port defaults (188/188/80/560/144,
    spkCacheEnable=true). Comments document the v2.1-only
    applicability and the auto-detection via the GGUF metadata tag
    parakeet.model_variant.
  - ParakeetModel.{hpp,cpp}: 6 read-only accessors; forward the
    fields into parakeet::SortformerStreamingOptions inside the
    SORTFORMER branch of the in-process streaming session (Mode 3).
  - ParakeetStreamingProcessor.{hpp,cpp}: mirror the 6 fields on the
    duplex Config struct; forward into SortformerStreamingOptions
    for the runStreaming() session.
  - AddonJs.hpp::startStreaming: source defaults from the model's
    getters; accept per-call overrides (spkCacheEnable, spkCacheLen,
    fifoLen, chunkLeftContextMs, chunkRightContextMs,
    spkCacheUpdatePeriod) on the runStreaming() config object.
  - JSAdapter.cpp::loadFromJSObject: read the 6 new keys at
    createInstance time so constructor-supplied parakeetConfig
    overrides reach C++.

- JS/TS API (transcription-parakeet/):
  - index.d.ts: declare the 6 fields on ParakeetConfig and matching
    overrides on StreamingRunConfig.
  - parakeet.js: extend the JSDoc on the constructor and
    startStreaming() with the new params (config is passed opaquely
    to native; no logic change here).
  - index.js: forward the 6 fields through _buildConfigurationParams()
    so they actually reach createInstance. (Without this, the JSDoc
    + native plumbing exist but the values never leave the JS layer --
    surfaced during local verification when streamingSpkCacheEnable=false
    initially didn't disable AOSC on the v2.1 GGUF.)

- SDK (packages/sdk/):
  - schemas/transcription-config.ts: extend parakeetRuntimeConfigSchema
    with all 7 streaming knobs (streaming, streamingChunkMs,
    streamingHistoryMs, streamingEmitPartials, streamingEnergyVad,
    streamingLeftContextMs, streamingRightLookaheadMs) plus the 6
    AOSC knobs. The streaming knobs were never exposed before -- AOSC
    is unreachable without them.
  - server/bare/plugins/parakeet-transcription/plugin.ts:
    createParakeetModel forwards all 13 new fields into
    addonConfig.parakeetConfig.
  - examples/transcription/parakeet-sortformer-streaming.ts (new):
    high-level SDK example for v2.1 + AOSC streaming.

- Model registry:
  - packages/registry-server/data/models.prod.json: new
    "Parakeet Streaming Sortformer 4SPK v2.1 GGUF (AOSC)" entry
    (placeholder s3 path; replace with real upload date once the
    GGUF lands). Updated v1 entry's notes to clarify its offline-
    default role.
  - (sdk/models/registry/models.ts is auto-generated and will pick
    up the new entry via models/update-models after the GGUF is
    uploaded; not touched here.)

- Tests (transcription-parakeet/test/integration/):
  - helpers.js: add sortformerStreaming MODEL_CONFIGS entry pointing
    at diar_streaming_sortformer_4spk-v2.1.q8_0.gguf.
  - sortformer-aosc-streaming.test.js (new): covers default-AOSC
    streaming + streamingSpkCacheEnable=false fallback to the v1
    sliding-window path. The full AOSC slot-stability contract is
    verified at C++ level in parakeet-cpp/test/test_sortformer_aosc_speakers.cpp;
    this JS-level test focuses on wiring correctness.

- Examples (transcription-parakeet/examples/):
  - live-mic-diarized-aosc.js (new): v2.1-focused dual-stream live
    mic example with full CLI control of the AOSC knobs.
  - live-mic-diarized.js / diarized-transcribe.js: header notes
    recommending v2.1 for streaming, v1 for offline.

- Docs:
  - README.md: extended Model Variants table with v1 (offline-default)
    and v2.1 (streaming-default) rows; new streamingSpkCache* rows in
    the ParakeetConfig table; dedicated "Sortformer Streaming
    Diarization (v2.1 + AOSC)" paragraph; updated example commands
    to point at the v2.1 GGUF.

Verification done locally against a vcpkg overlay pointing at the
PR #156 branch: addon compiles with the new parakeet-cpp; full
integration suite passes 37/37 (72/72 asserts) with all q8_0 GGUFs
staged, including both new AOSC test cases.

Depends on:
  - qvac-registry-vcpkg PR #156 (parakeet-cpp 2026-05-20 bump). CI
    will not be able to resolve the new version>= constraint until
    that PR merges.
  - Upload of diar_streaming_sortformer_4spk-v2.1.q8_0.gguf to S3
    (replace the placeholder source path in models.prod.json once
    the upload date is known, then re-run models/update-models to
    sync sdk/models/registry/models.ts).
pratiknarola-t pushed a commit that referenced this pull request May 20, 2026
## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS / SDK layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs from JS through `ParakeetConfig` ->
  `ParakeetModel` / `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Surface the 6 AOSC knobs plus the 7 pre-existing streaming knobs on
  the SDK schema + plugin so SDK consumers can configure AOSC without
  dropping to `@qvac/transcription-parakeet` directly. (Streaming was
  never exposed on the SDK schema before -- AOSC is unreachable without
  `streaming: true`.)
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR #156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`, and as flat fields on the SDK
`parakeetRuntimeConfigSchema`. All default to parakeet-cpp's
NeMo-port tuning; specifying them is opt-in. Ignored on v1 / v2
Sortformer and on non-Sortformer engines.

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

Same fields available via the SDK plugin:

```typescript
import { loadModel } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "<v2.1-sortformer-src>",
  modelType: "parakeet",
  modelConfig: {
    modelType: "sortformer",
    parakeetSortformerSrc: "<v2.1-sortformer-src>",
    streaming: true,
    streamingChunkMs: 2000,
    streamingSpkCacheEnable: true,
    streamingChunkRightContextMs: 560,
  },
});
```

## Depends on

- qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF model entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).
pratiknarola-t pushed a commit that referenced this pull request May 20, 2026
Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`,
  `streamingFifoLen`, `streamingChunkLeftContextMs`,
  `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from
  JS through `ParakeetConfig` -> `ParakeetModel` /
  `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR #156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

## Depends on

- qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).

## Follow-up (separate PR, not in scope here)

SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into `sdk/models/registry/models.ts`. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.
pratiknarola-t pushed a commit that referenced this pull request May 20, 2026
…ased]

Two reviewer follow-ups on the v2.1 + AOSC PR:

1. `npm run setup-models` now fetches + converts v2.1 sortformer.
   - download-models.sh: new `sortformer-streaming-v2.1` type pulling
     from
     https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
   - convert-nemo.sh: matching type maps .nemo ->
     `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
   - `--type all` (default) now includes the new type, so
     `npm run setup-models` stages v2.1 alongside the other models.
   - convert-nemo-to-gguf.py: surgically picked up PR #24's variant
     emission (the `detect_sortformer_variant(ckpt)` helper +
     `writer.add_string("parakeet.model_variant", ...)` call) without
     touching local qvac divergences (vendored attribution header,
     descriptive docstrings, `--quant f16` default, and the
     huggingface_hub import-error helper). The C++ engine's strict
     v2.1 detection now matches on `parakeet.model_variant ==
     "sortformer-streaming-v2.1-aosc"` instead of falling back to
     the encoder-shape heuristic.
   - Verified end-to-end locally: `bash scripts/convert-nemo.sh
     --type sortformer-streaming-v2.1 --quant q8_0 --force` produces
     models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
     resulting GGUF carries `parakeet.model_variant =
     "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).

2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
   package.json + vcpkg.json reverted to 0.4.0. The release PR will
   promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.
GustavoA1604 added a commit that referenced this pull request May 20, 2026
…2137)

* feat[api]: add Sortformer v2.1 + AOSC streaming diarization support

Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`,
  `streamingFifoLen`, `streamingChunkLeftContextMs`,
  `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from
  JS through `ParakeetConfig` -> `ParakeetModel` /
  `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR #156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

## Depends on

- qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).

## Follow-up (separate PR, not in scope here)

SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into `sdk/models/registry/models.ts`. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.

* chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased]

Two reviewer follow-ups on the v2.1 + AOSC PR:

1. `npm run setup-models` now fetches + converts v2.1 sortformer.
   - download-models.sh: new `sortformer-streaming-v2.1` type pulling
     from
     https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
   - convert-nemo.sh: matching type maps .nemo ->
     `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
   - `--type all` (default) now includes the new type, so
     `npm run setup-models` stages v2.1 alongside the other models.
   - convert-nemo-to-gguf.py: surgically picked up PR #24's variant
     emission (the `detect_sortformer_variant(ckpt)` helper +
     `writer.add_string("parakeet.model_variant", ...)` call) without
     touching local qvac divergences (vendored attribution header,
     descriptive docstrings, `--quant f16` default, and the
     huggingface_hub import-error helper). The C++ engine's strict
     v2.1 detection now matches on `parakeet.model_variant ==
     "sortformer-streaming-v2.1-aosc"` instead of falling back to
     the encoder-shape heuristic.
   - Verified end-to-end locally: `bash scripts/convert-nemo.sh
     --type sortformer-streaming-v2.1 --quant q8_0 --force` produces
     models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
     resulting GGUF carries `parakeet.model_variant =
     "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).

2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
   package.json + vcpkg.json reverted to 0.4.0. The release PR will
   promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.

* fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree

The registry's parakeet-cpp.json lists both 2026-05-20#0 and
2026-05-20#1 (PR #156 introduced both port-versions in its two
commits before squash-merging). vcpkg's minimum-version-selection
picks #0 when the manifest says `version>=: 2026-05-20`, but the
#0 git-tree is orphaned by the squash merge -- unreachable from
main, so `git fetch HEAD` doesn't pull it in. CI fails with:

  fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127
  note: while loading parakeet-cpp@2026-05-20

Pinning `version>=: 2026-05-20#1` skips the orphan and resolves
to the actual port content on main (tree 69619b43...). Matches
the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same
file.

Local clean build (no overlay, no cached registry) succeeds.

* cpp lint format

* Bump version

---------

Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
…2137)

* feat[api]: add Sortformer v2.1 + AOSC streaming diarization support

Bumps @qvac/transcription-parakeet 0.4.0 -> 0.5.0 (MINOR -- additive
API only; no breaking changes).

## 🎯 What problem does this PR solve?

- v1 Sortformer streaming uses a fixed-size sliding-history window; once
  a speaker goes silent long enough to roll out of the window, their
  slot identity drifts onto a different physical voice when they return.
- Continuous single-speaker stretches collapse all voices onto
  `sortformer_0` once two speakers have been seen, breaking live
  speaker-tagged transcripts.
- v2.1 + AOSC (Audio-Online Speaker Cache, NeMo-ported) fixes this in
  parakeet-cpp, but until now there was no way to consume it from the
  JS layer.

## 📝 How does it solve it?

- Bump `parakeet-cpp` to `version>= 2026-05-20` (the qvac-registry-vcpkg
  bump in PR #156 pulls in PRs #22 / #24 of qvac-ext-lib-whisper.cpp).
- Plumb 6 AOSC knobs (`streamingSpkCacheEnable`, `streamingSpkCacheLen`,
  `streamingFifoLen`, `streamingChunkLeftContextMs`,
  `streamingChunkRightContextMs`, `streamingSpkCacheUpdatePeriod`) from
  JS through `ParakeetConfig` -> `ParakeetModel` /
  `ParakeetStreamingProcessor` -> `parakeet::SortformerStreamingOptions`,
  for both the in-process Mode-3 streaming path and the duplex
  `runStreaming()` processor.
- v2.1 is auto-detected by the engine via the GGUF metadata tag
  `parakeet.model_variant`; AOSC defaults mirror parakeet-cpp's
  NeMo-port tuning (188 / 188 / 80 / 560 / 144, enabled).
- Defaults: v2.1 becomes the streaming Sortformer; v1 stays the offline
  default. Both GGUFs remain registered.
- New `examples/live-mic-diarized-aosc.js` exposes every AOSC knob as a
  CLI flag for A/B comparison against the v1 sliding-window path.

## 🧪 How was it tested?

- Built locally against a vcpkg overlay pointing at the PR #156 branch;
  addon compiled cleanly with all 6 new AOSC field references through
  `ParakeetStreamingProcessor.cpp`, `ParakeetModel.cpp`, `AddonJs.hpp`,
  and `JSAdapter.cpp`.
- Full integration suite: **37/37 tests pass, 72/72 assertions in 145s**
  (macOS arm64, all q8_0 GGUFs staged including v2.1 Sortformer).
- New `test/integration/sortformer-aosc-streaming.test.js` covers
  default-AOSC streaming + `streamingSpkCacheEnable=false` fallback to
  the v1 sliding-window code path. Confirmed via engine logs that the
  override actually disables the cache (`Sortformer AOSC enabled` line
  only prints when AOSC is active).
- v1 Sortformer desktop integration + GPU smoke tests still pass -- no
  regression to the existing diarization path.

## 🔌 API Changes

New optional fields on `ParakeetConfig`, mirrored as per-call overrides
on `StreamingRunConfig`. All default to parakeet-cpp's NeMo-port
tuning; specifying them is opt-in. Ignored on v1 / v2 Sortformer and on
non-Sortformer engines (no-op forwarding is safe).

```typescript
import { TranscriptionParakeet } from "@qvac/transcription-parakeet";

const model = new TranscriptionParakeet({
  files: { model: "diar_streaming_sortformer_4spk-v2.1.q8_0.gguf" },
  config: {
    parakeetConfig: {
      streaming: true,
      streamingChunkMs: 2000,
      // AOSC (v2.1+ only; auto-detected via GGUF metadata)
      streamingSpkCacheEnable: true,         // default
      streamingSpkCacheLen: 188,             // long-term cache rows
      streamingFifoLen: 188,                 // warmup FIFO rows
      streamingChunkLeftContextMs: 80,       // ~1 encoder frame
      streamingChunkRightContextMs: 560,     // ~7 encoder frames
      streamingSpkCacheUpdatePeriod: 144,    // FIFO-overflow pop count
    },
  },
});
```

## Depends on

- qvac-registry-vcpkg #156 (parakeet-cpp 2026-05-20 bump). CI will not
  resolve the new `version>=` constraint until that PR merges.
- Separate registry-server PR for the v2.1 GGUF entry in
  `models.prod.json` (out of scope for this PR -- handled independently).
- Upload of `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf` to S3 (the
  GGUF the new test resolves via `MODEL_CONFIGS.sortformerStreaming`).

## Follow-up (separate PR, not in scope here)

SDK adoption (`@qvac/sdk` schema + plugin + example) lands in a
separate PR after this addon is published and the v2.1 GGUF entry has
synced into `sdk/models/registry/models.ts`. The SDK needs both pieces
in place before its schema can meaningfully forward AOSC knobs.

* chore[notask]: address review — setup-models v2.1 + CHANGELOG [Unreleased]

Two reviewer follow-ups on the v2.1 + AOSC PR:

1. `npm run setup-models` now fetches + converts v2.1 sortformer.
   - download-models.sh: new `sortformer-streaming-v2.1` type pulling
     from
     https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo
   - convert-nemo.sh: matching type maps .nemo ->
     `diar_streaming_sortformer_4spk-v2.1.${q}.gguf`.
   - `--type all` (default) now includes the new type, so
     `npm run setup-models` stages v2.1 alongside the other models.
   - convert-nemo-to-gguf.py: surgically picked up PR #24's variant
     emission (the `detect_sortformer_variant(ckpt)` helper +
     `writer.add_string("parakeet.model_variant", ...)` call) without
     touching local qvac divergences (vendored attribution header,
     descriptive docstrings, `--quant f16` default, and the
     huggingface_hub import-error helper). The C++ engine's strict
     v2.1 detection now matches on `parakeet.model_variant ==
     "sortformer-streaming-v2.1-aosc"` instead of falling back to
     the encoder-shape heuristic.
   - Verified end-to-end locally: `bash scripts/convert-nemo.sh
     --type sortformer-streaming-v2.1 --quant q8_0 --force` produces
     models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf and the
     resulting GGUF carries `parakeet.model_variant =
     "sortformer-streaming-v2.1-aosc"` (confirmed via gguf reader).

2. CHANGELOG entry moved under `## [Unreleased]`; version bumps in
   package.json + vcpkg.json reverted to 0.4.0. The release PR will
   promote `[Unreleased]` -> `[0.5.0]` and bump the versions then.

* fix[notask]: pin parakeet-cpp to 2026-05-20#1 to avoid orphan tree

The registry's parakeet-cpp.json lists both 2026-05-20#0 and
2026-05-20#1 (PR #156 introduced both port-versions in its two
commits before squash-merging). vcpkg's minimum-version-selection
picks #0 when the manifest says `version>=: 2026-05-20`, but the
#0 git-tree is orphaned by the squash merge -- unreachable from
main, so `git fetch HEAD` doesn't pull it in. CI fails with:

  fatal: failed to unpack tree object 91a6fc169003b70dcc66b82ca8d1d23445343127
  note: while loading parakeet-cpp@2026-05-20

Pinning `version>=: 2026-05-20#1` skips the orphan and resolves
to the actual port content on main (tree 69619b43...). Matches
the existing `qvac-lint-cpp >= 1.4.4#3` precedent in the same
file.

Local clean build (no overlay, no cached registry) succeeds.

* cpp lint format

* Bump version

---------

Co-authored-by: Pratik Narola <pratiknarola@Mac.bbrouter>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>
Co-authored-by: GustavoA1604 <gustavogefa@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant