diff --git a/packages/transcription-parakeet/CHANGELOG.md b/packages/transcription-parakeet/CHANGELOG.md
index 05b48234e9..c7b333537f 100644
--- a/packages/transcription-parakeet/CHANGELOG.md
+++ b/packages/transcription-parakeet/CHANGELOG.md
@@ -5,18 +5,35 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased]
+## [0.6.0]
 
-Update Android prebuild to ship Vulkan and OpenCL as separately-loadable MODULE `.so` files (qvac-ext-ggml@speech's `GGML_BACKEND_DL=ON`) discovered at runtime via `ggml_backend_load_all_from_path()`, as well as per-arch CPU variants (`libqvac-speech-ggml-cpu-android_armv{8.0,8.2,8.6,9.0,9.2}_*.so`).
+In this release we reestablish the GGML implementation from `0.4.0` with extra additions. The main features are exposing the v2.1 streaming Sortformer model with NeMo-port AOSC (Audio-Online Speaker Cache) through the addon's public API and overhaul the Android prebuild to ship the ggml backends as separately-loadable MODULE `.so` files. v2.1 becomes the recommended streaming Sortformer model; v1 stays the offline-batch default. On the Android side, Vulkan and OpenCL ship as runtime-discovered `.so` files (qvac-ext-ggml@speech's `GGML_BACKEND_DL=ON`), alongside per-arch CPU variants (`libqvac-speech-ggml-cpu-android_armv{8.0,8.2,8.6,9.0,9.2}_*.so`); inference still runs on CPU there pending Vulkan/Mali + OpenCL/Adreno driver fixes (`useGPU` is overridden at the engine boundary), but the GPU `.so` files are in place for when the override is lifted.
 
 ### Added
-
+- **AOSC config knobs.** `ParakeetConfig` gains six optional fields — `streamingSpkCacheEnable` (default `true`), `streamingSpkCacheLen` (188), `streamingFifoLen` (188), `streamingChunkLeftContextMs` (80), `streamingChunkRightContextMs` (560), `streamingSpkCacheUpdatePeriod` (144) — forwarded into `parakeet::SortformerStreamingOptions` for both the in-process Mode-3 streaming path (`ParakeetModel::runStreamingProcess_`) and the duplex `runStreaming()` processor (`ParakeetStreamingProcessor`). Mirrored as per-call overrides on `StreamingRunConfig` (`spkCacheEnable`, `spkCacheLen`, `fifoLen`, `chunkLeftContextMs`, `chunkRightContextMs`, `spkCacheUpdatePeriod`). parakeet-cpp ignores these on v1 / v2 Sortformer GGUFs and on non-Sortformer engines, so always-forward is safe.
+- **v2.1 Sortformer auto-detection.** When a `diar_streaming_sortformer_4spk-v2.1.*` GGUF is loaded, parakeet-cpp's engine recognises it from the GGUF metadata tag `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"` and enables AOSC by default. Setting `streamingSpkCacheEnable: false` forces the v1 sliding-window code path on a v2.1 model (A/B comparison).
+- **`examples/live-mic-diarized-aosc.js`** — v2.1-focused dual-stream live mic example mirroring `live-mic-diarized.js`'s ASR + Sortformer pattern, with CLI flags for every AOSC knob (`--spk-cache-enable`, `--spk-cache-len`, `--fifo-len`, `--chunk-left-context-ms`, `--chunk-right-context-ms`, `--spk-cache-update-period`).
+- **`test/integration/sortformer-aosc-streaming.test.js`** — covers default-AOSC streaming and `streamingSpkCacheEnable=false` fallback. The full AOSC slot-stability contract (same physical speaker → same `Speaker N` tag across non-contiguous re-entries) is verified at C++ level in `parakeet-cpp/test/test_sortformer_aosc_speakers.cpp`; this JS-level test focuses on wiring correctness — that the override actually reaches the engine and the engine emits well-formed segments in both modes.
+- **`MODEL_CONFIGS.sortformerStreaming`** entry in `test/integration/helpers.js` pointing at `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf`. Tests skip cleanly when the GGUF isn't staged via `npm run setup-models` / `QVAC_TEST_GGUF_*`.
 - **`backendsDir` ParakeetConfig field.** Directory the native addon scans for dynamically-loaded ggml backend libraries (`libqvac-speech-ggml-vulkan.so`, `libqvac-speech-ggml-opencl.so`, per-arch `libqvac-speech-ggml-cpu-android_armv*_*.so`).
-- **`openclCacheDir` ParakeetConfig field.** Persistent directory for ggml-opencl's `clCreateProgramWithBinary` cache. 
+- **`openclCacheDir` ParakeetConfig field.** Persistent directory for ggml-opencl's `clCreateProgramWithBinary` cache.
 - **CMake install plumbing for dynamic ggml backends.** Two complementary install paths cover the full backend set that the `ggml-speech` vcpkg port emits on Android.
 - **`BACKENDS_SUBDIR` compile define** on the addon target. Derived from cmake-bare's `bare_target()` + `bare_module_target()` so the addon can join `<bare-target>/<module-name>` onto the host-provided `backendsDir` root without the host needing to know the per-target shape.
 - **Mobile dynamic-backend coverage.** `test/mobile/integration-runtime.cjs` now sets `NO_GPU=false` so Device Farm runs `gpu-smoke` and `mobile-perf-*-gpu` tests that exercise backend dlopen / discovery (Vulkan, OpenCL, and per-arch CPU `.so` loading). On Android, inference still runs on CPU (`useGPU` is overridden at the engine boundary and gpu-smoke passes early); iOS may engage Metal when `useGPU: true`.
 
+### Changed
+- **parakeet-cpp dep bumped** to `version>= 2026-05-20#2` (was `2026-05-05#1`) across all three platform branches in `vcpkg.json`. The new port (qvac-registry-vcpkg PR #156 + the `ggml-speech#3` follow-up) pulls in PRs #22 + #24 of `qvac-ext-lib-whisper.cpp`, which introduce the v2.1 Sortformer support, AOSC engine implementation, strict variant detection via the `parakeet.model_variant` GGUF tag, and review-fixup cleanups (magic-number elimination, dead-code removal, test utility consolidation, Windows `<algorithm>` include), and tightens the `ggml-speech` constraint to the per-arch Android CPU build (`GGML_CPU_ALL_VARIANTS=ON`).
+- **`index.js::_buildConfigurationParams()`** now forwards the 6 new AOSC fields (and explicit defaults for unset values) into `createInstance` / `reload`. Without this, JSDoc + native plumbing would exist but JS-layer overrides would never reach C++.
+- **`examples/live-mic-diarized.js`** header: recommends the v2.1 GGUF as `--diar-model` and notes that `streamingHistoryMs` is superseded by AOSC on v2.1 models (kept for v1 back-compat). Points to the new `live-mic-diarized-aosc.js` for explicit knob control.
+- **`examples/diarized-transcribe.js`** header: notes v1 remains the recommended OFFLINE diarization model — AOSC's slot-stability benefit only applies to continuous streaming and is wasted in batch mode.
+- **`README.md`** — extended Model Variants table with v1 (offline default) and v2.1 + AOSC (streaming default) rows; new `streamingSpkCache*` rows in the ParakeetConfig table; dedicated "Sortformer Streaming Diarization (v2.1 + AOSC)" section explaining the v1-drift problem AOSC solves, the model-variant auto-detection, and when to leave the defaults alone.
+
+## [0.5.0]
+
+- Temporarily reverted back to ONNX implementation of `0.3.3` to ensure stability in SDK `0.11.*`.
+- Bumped `inference-addon-cpp` dependency version to `1.1.7#1`.
+- Bumped `onnx` dependency version to `0.15.0`.
+
 ## [0.4.0]
 
 In this release, we have replaced the onnxruntime backend with a pure C++/ggml engine, added a duplex-streaming entry point that bypasses the framework's batch-then-process lifecycle for live use cases, and surfaced two new per-segment signals (`isEndOfTurn`, `startsWord`) so consumers can build cleaner live transcripts. The release also exposes per-engine backend stats (`backendDevice`, `backendId`) so callers can verify the GPU path actually engaged, and consolidates the examples / docs / mock fixtures into a single duplex-aware surface.
diff --git a/packages/transcription-parakeet/README.md b/packages/transcription-parakeet/README.md
index bf2b2a2e7c..5bcb7995e9 100644
--- a/packages/transcription-parakeet/README.md
+++ b/packages/transcription-parakeet/README.md
@@ -214,11 +214,48 @@ Most users interact with the package through `index.js`. From that entrypoint we
 | | `streamingEnergyVad` | CTC/TDT energy-VAD events (default: `false`) |
 | | `streamingLeftContextMs` | ASR encoder left-context window in ms; `-1` keeps parakeet-cpp's default of 10000. ASR sessions only (Sortformer ignores it). |
 | | `streamingRightLookaheadMs` | ASR encoder right-lookahead window in ms; `-1` keeps parakeet-cpp's default of 2000. Adds directly to the per-segment latency floor (`chunk_ms + right_lookahead_ms`). ASR sessions only. |
+| | `streamingSpkCacheEnable` | AOSC: enable v2.1 Sortformer's speaker-cache streaming (default: `true`). Ignored on v1/v2 Sortformer GGUFs and on non-Sortformer models. Set `false` to force a v2.1 GGUF onto the v1 sliding-window path (A/B comparison). |
+| | `streamingSpkCacheLen` | AOSC: long-term speaker-cache rows (~15 s of encoder frames). Default: 188. |
+| | `streamingFifoLen` | AOSC: FIFO warmup buffer rows. Default: 188. |
+| | `streamingChunkLeftContextMs` | AOSC: encoder left-context window (ms; ~1 encoder frame). Default: 80. |
+| | `streamingChunkRightContextMs` | AOSC: encoder right-context window (ms; ~7 encoder frames). Default: 560. |
+| | `streamingSpkCacheUpdatePeriod` | AOSC: FIFO-overflow pop-out count. Default: 144. |
 | | `backendsDir` | Root directory for dynamically-loaded ggml backend `.so` files (Vulkan, OpenCL, per-arch CPU variants on Android). Defaults to the package's `prebuilds/` folder; the native addon appends `<bare-target>/<module-name>` before scanning. Pass an explicit path when prebuilds live elsewhere — e.g. Android `ApplicationInfo.nativeLibraryDir` when backend libs ship inside the APK. No-op on Apple (statically linked). |
 | | `openclCacheDir` | Persistent directory for ggml-opencl's compiled program-binary cache (`$GGML_OPENCL_CACHE_DIR`). Android-only; pass the host app's cache directory (e.g. `Context.getCacheDir()`) to skip cold `clBuildProgram` on every process start. Ignored on other platforms. |
 
 The model type (CTC / TDT / EOU / Sortformer) is **auto-detected from the GGUF metadata**, so callers don't need to pass `modelType`. Other knobs (`captionEnabled`, `timestampsEnabled`, `seed`, `sampleRate`, `channels`) keep sensible defaults.
 
+**Sortformer Streaming Diarization (v2.1 + AOSC).** parakeet-cpp ships
+two streaming-diarization paths picked automatically by the GGUF:
+
+- **v1** uses a fixed-size sliding-history window inside the engine.
+  Once two voices have been seen, the per-chunk decisions are
+  permutation-invariant; if a speaker goes silent long enough to roll
+  out of the window, the slot can drift onto a different physical voice
+  when they return. Fine for short, stable clips; ships as
+  `sortformer-4spk-v1.q8_0.gguf`.
+- **v2.1** replaces the sliding window with AOSC (Audio-Online Speaker
+  Cache, ported from NVIDIA NeMo) which anchors each slot to its
+  accumulated embedding. Same physical speaker comes back to the same
+  `Speaker N` tag across silences. Default for live capture; ships as
+  `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf`. The engine detects
+  v2.1 via the GGUF metadata tag
+  `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"`; you
+  don't need to opt in via config.
+
+The defaults in the `streamingSpkCache*` / `streamingFifo*` /
+`streamingChunk{Left,Right}ContextMs` table rows above are the NeMo-port
+tuning parakeet-cpp ships -- you almost always want to keep them. The
+knobs are exposed for A/B comparison (e.g. `--spk-cache-enable false`
+in `examples/live-mic-diarized-aosc.js` to force a v2.1 GGUF onto the
+v1 path) and for tuning unusual audio (longer cache, larger
+right-context window for higher latency tolerance, etc.).
+
+For offline diarization (single batch over a finite clip) v1 remains
+the recommended GGUF -- AOSC's slot-stability benefit only applies to
+continuous streaming and offers no measurable improvement when the
+entire clip is available at once.
+
 #### Configuration Example
 
 ```javascript
@@ -410,10 +447,16 @@ bare examples/diarized-transcribe.js \
 # Live mic transcription
 bare examples/live-mic.js --model models/parakeet-eou-120m-v1.q8_0.gguf --accumulate
 
-# Live mic + speaker tagging
+# Live mic + speaker tagging (recommended: v2.1 diar GGUF, AOSC auto-on)
 bare examples/live-mic-diarized.js \
      --asr-model  models/parakeet-tdt-0.6b-v3.q8_0.gguf \
-     --diar-model models/sortformer-4spk-v1.q8_0.gguf --accumulate
+     --diar-model models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf --accumulate
+
+# Same as above, with explicit AOSC tuning knobs exposed as CLI flags
+bare examples/live-mic-diarized-aosc.js \
+     --asr-model  models/parakeet-tdt-0.6b-v3.q8_0.gguf \
+     --diar-model models/diar_streaming_sortformer_4spk-v2.1.q8_0.gguf \
+     --spk-cache-len 256 --chunk-right-context-ms 480 --accumulate
 ```
 
 > If you use `npm run example:* -- ...` instead of `bare`, remember the `--` separator -- without it npm interprets `--model` as one of its own config flags.
@@ -427,14 +470,16 @@ The live-mic examples capture the default input device via `sox -d` (install: `b
 | **CTC** | English | argmax CTC | ~ 700 MiB | Fast, no PnC. |
 | **TDT** | ~25 | RNN-T greedy + duration | ~ 715 MiB | Recommended default; PnC + auto-detect. |
 | **EOU** | English | RNN-T greedy + `<EOU>` | ~ 132 MiB | Streaming-trained; native end-of-turn token. |
-| **Sortformer** | n/a | Diarization head | ~ 141 MiB | 4-speaker. |
+| **Sortformer v1** | n/a | Diarization head (sliding history) | ~ 141 MiB | 4-speaker. **Default for offline diarization.** |
+| **Sortformer v2.1 + AOSC** | n/a | Diarization head + speaker cache | ~ 141 MiB | 4-speaker. **Default for streaming diarization.** AOSC anchors speaker slots across silence/re-entry; auto-detected via GGUF metadata tag `parakeet.model_variant`. |
 
 ## Other examples
 
 - [`examples/transcribe.js`](examples/transcribe.js) -- universal single-file transcribe / diarize (any GGUF, all model types).
 - [`examples/diarized-transcribe.js`](examples/diarized-transcribe.js) -- combined Sortformer + ASR pipeline ("who said what").
 - [`examples/live-mic.js`](examples/live-mic.js) -- live microphone transcription via `sox` and the streaming session.
-- [`examples/live-mic-diarized.js`](examples/live-mic-diarized.js) -- live mic with parallel Sortformer + ASR for speaker-tagged transcripts.
+- [`examples/live-mic-diarized.js`](examples/live-mic-diarized.js) -- live mic with parallel Sortformer + ASR for speaker-tagged transcripts. Pass a v2.1 Sortformer GGUF to get AOSC speaker-cache streaming automatically.
+- [`examples/live-mic-diarized-aosc.js`](examples/live-mic-diarized-aosc.js) -- same as above but with CLI flags for the AOSC tuning knobs (`--spk-cache-len`, `--fifo-len`, `--chunk-right-context-ms`, `--spk-cache-enable`, etc.). Useful for A/B comparing AOSC vs the v1 sliding-window code path on the same v2.1 GGUF.
 - [`examples/decode-audio.js`](examples/decode-audio.js) -- decode + transcribe in one step. Same flag surface as `transcribe.js` but pipes the input through `@qvac/decoder-audio` (FFmpeg) first, so any container / codec FFmpeg supports (mp3, m4a, ogg, flac, mp4, ...) works -- not just 16 kHz mono `.wav` / raw s16le PCM.
 - [`examples/utils.js`](examples/utils.js) -- shared helpers used by the examples (`loadWeights` streaming, `Output`/`JobEnded` race resolution).
 
diff --git a/packages/transcription-parakeet/addon/src/addon/AddonJs.hpp b/packages/transcription-parakeet/addon/src/addon/AddonJs.hpp
index 12cfea1766..02176d0311 100644
--- a/packages/transcription-parakeet/addon/src/addon/AddonJs.hpp
+++ b/packages/transcription-parakeet/addon/src/addon/AddonJs.hpp
@@ -163,6 +163,14 @@ startStreaming(js_env_t* env, js_callback_info_t* info) try {
       parakeetModel.getDiarMinDurationOn() * 1000.0F);
   config.leftContextMs      = parakeetModel.getStreamingLeftContextMs();
   config.rightLookaheadMs   = parakeetModel.getStreamingRightLookaheadMs();
+  // AOSC defaults sourced from the model's load-time ParakeetConfig.
+  config.spkCacheEnable = parakeetModel.getStreamingSpkCacheEnable();
+  config.spkCacheLen = parakeetModel.getStreamingSpkCacheLen();
+  config.fifoLen = parakeetModel.getStreamingFifoLen();
+  config.chunkLeftContextMs = parakeetModel.getStreamingChunkLeftContextMs();
+  config.chunkRightContextMs = parakeetModel.getStreamingChunkRightContextMs();
+  config.spkCacheUpdatePeriod =
+      parakeetModel.getStreamingSpkCacheUpdatePeriod();
 
   if (auto chunkMs =
           configObj.getOptionalProperty<js::Number>(env, "chunkMs");
@@ -198,6 +206,48 @@ startStreaming(js_env_t* env, js_callback_info_t* info) try {
       emitEnergyVad.has_value()) {
     config.emitEnergyVad = emitEnergyVad.value().as<bool>(env);
   }
+  // AOSC per-call overrides (v2.1+ Sortformer only).
+  if (auto spkCacheEnable =
+          configObj.getOptionalProperty<js::Boolean>(env, "spkCacheEnable");
+      spkCacheEnable.has_value()) {
+    config.spkCacheEnable = spkCacheEnable.value().as<bool>(env);
+  }
+  if (auto spkCacheLen =
+          configObj.getOptionalProperty<js::Number>(env, "spkCacheLen");
+      spkCacheLen.has_value()) {
+    const auto v = static_cast<int>(spkCacheLen.value().as<double>(env));
+    if (v > 0)
+      config.spkCacheLen = v;
+  }
+  if (auto fifoLen = configObj.getOptionalProperty<js::Number>(env, "fifoLen");
+      fifoLen.has_value()) {
+    const auto v = static_cast<int>(fifoLen.value().as<double>(env));
+    if (v > 0)
+      config.fifoLen = v;
+  }
+  if (auto chunkLeftContextMs =
+          configObj.getOptionalProperty<js::Number>(env, "chunkLeftContextMs");
+      chunkLeftContextMs.has_value()) {
+    const auto v = static_cast<int>(chunkLeftContextMs.value().as<double>(env));
+    if (v >= 0)
+      config.chunkLeftContextMs = v;
+  }
+  if (auto chunkRightContextMs =
+          configObj.getOptionalProperty<js::Number>(env, "chunkRightContextMs");
+      chunkRightContextMs.has_value()) {
+    const auto v =
+        static_cast<int>(chunkRightContextMs.value().as<double>(env));
+    if (v >= 0)
+      config.chunkRightContextMs = v;
+  }
+  if (auto spkCacheUpdatePeriod = configObj.getOptionalProperty<js::Number>(
+          env, "spkCacheUpdatePeriod");
+      spkCacheUpdatePeriod.has_value()) {
+    const auto v =
+        static_cast<int>(spkCacheUpdatePeriod.value().as<double>(env));
+    if (v > 0)
+      config.spkCacheUpdatePeriod = v;
+  }
 
   {
     std::lock_guard<std::mutex> lock(g_streamingMtx);
diff --git a/packages/transcription-parakeet/addon/src/js-interface/JSAdapter.cpp b/packages/transcription-parakeet/addon/src/js-interface/JSAdapter.cpp
index bedbf481fe..3d20e448ee 100644
--- a/packages/transcription-parakeet/addon/src/js-interface/JSAdapter.cpp
+++ b/packages/transcription-parakeet/addon/src/js-interface/JSAdapter.cpp
@@ -107,6 +107,54 @@ auto JSAdapter::loadFromJSObject(js::Object jsObject, js_env_t* env)
         streamingRightLookaheadMsOpt.value().as<int32_t>(env);
   }
 
+  // AOSC (v2.1+ Sortformer only). All optional; unspecified values keep
+  // ParakeetConfig's defaults. Forwarded into
+  // parakeet::SortformerStreamingOptions by ParakeetModel /
+  // ParakeetStreamingProcessor; ignored for v1/v2/non-Sortformer.
+  auto streamingSpkCacheEnableOpt =
+      jsObject.getOptionalProperty<js::Boolean>(env, "streamingSpkCacheEnable");
+  if (streamingSpkCacheEnableOpt.has_value()) {
+    config.streamingSpkCacheEnable =
+        streamingSpkCacheEnableOpt.value().as<bool>(env);
+  }
+
+  auto streamingSpkCacheLenOpt =
+      jsObject.getOptionalProperty<js::Number>(env, "streamingSpkCacheLen");
+  if (streamingSpkCacheLenOpt.has_value()) {
+    config.streamingSpkCacheLen =
+        streamingSpkCacheLenOpt.value().as<int32_t>(env);
+  }
+
+  auto streamingFifoLenOpt =
+      jsObject.getOptionalProperty<js::Number>(env, "streamingFifoLen");
+  if (streamingFifoLenOpt.has_value()) {
+    config.streamingFifoLen = streamingFifoLenOpt.value().as<int32_t>(env);
+  }
+
+  auto streamingChunkLeftContextMsOpt =
+      jsObject.getOptionalProperty<js::Number>(
+          env, "streamingChunkLeftContextMs");
+  if (streamingChunkLeftContextMsOpt.has_value()) {
+    config.streamingChunkLeftContextMs =
+        streamingChunkLeftContextMsOpt.value().as<int32_t>(env);
+  }
+
+  auto streamingChunkRightContextMsOpt =
+      jsObject.getOptionalProperty<js::Number>(
+          env, "streamingChunkRightContextMs");
+  if (streamingChunkRightContextMsOpt.has_value()) {
+    config.streamingChunkRightContextMs =
+        streamingChunkRightContextMsOpt.value().as<int32_t>(env);
+  }
+
+  auto streamingSpkCacheUpdatePeriodOpt =
+      jsObject.getOptionalProperty<js::Number>(
+          env, "streamingSpkCacheUpdatePeriod");
+  if (streamingSpkCacheUpdatePeriodOpt.has_value()) {
+    config.streamingSpkCacheUpdatePeriod =
+        streamingSpkCacheUpdatePeriodOpt.value().as<int32_t>(env);
+  }
+
   // Dynamic-backend loading knobs. Both forwarded to
   // parakeet::EngineOptions and consumed once per-process on the
   // first Engine construction (the ggml-backend registry + the
diff --git a/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.cpp b/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.cpp
index 8161375bb5..2d53a4ba3e 100644
--- a/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.cpp
+++ b/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.cpp
@@ -46,6 +46,15 @@ ParakeetStreamingProcessor::ParakeetStreamingProcessor(
     opts.threshold      = config_.diarOnsetThreshold;
     opts.min_segment_ms = config_.diarMinSegmentMs;
     opts.emit_partials  = config_.emitPartials;
+    // AOSC (v2.1+ Sortformer only). parakeet-cpp ignores these fields for
+    // v1/v2 GGUFs (variant detected from `parakeet.model_variant` metadata
+    // or the encoder shape heuristic), so always-forward is safe.
+    opts.spkcache_enable = config_.spkCacheEnable;
+    opts.spkcache_len = config_.spkCacheLen;
+    opts.fifo_len = config_.fifoLen;
+    opts.chunk_left_context_ms = config_.chunkLeftContextMs;
+    opts.chunk_right_context_ms = config_.chunkRightContextMs;
+    opts.spkcache_update_period = config_.spkCacheUpdatePeriod;
 
     diar_session_ = model_.createDuplexDiarizationSession(
         opts,
diff --git a/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.hpp b/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.hpp
index f611172eb6..559e9e9b04 100644
--- a/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.hpp
+++ b/packages/transcription-parakeet/addon/src/model-interface/ParakeetStreamingProcessor.hpp
@@ -54,6 +54,18 @@ class ParakeetStreamingProcessor {
     // parakeet engine default in place" (10000 / 2000 ms respectively).
     int  leftContextMs      = -1;
     int  rightLookaheadMs   = -1;
+    // === AOSC (v2.1+ Sortformer only) ====================================
+    // Forwarded into parakeet::SortformerStreamingOptions when the loaded
+    // model is a v2.1 Sortformer GGUF (auto-detected from the GGUF's
+    // `parakeet.model_variant` metadata tag). parakeet-cpp ignores these
+    // fields on v1/v2 GGUFs and on non-Sortformer engines, so they are
+    // always safe to forward.
+    bool spkCacheEnable = true;
+    int spkCacheLen = 188;
+    int fifoLen = 188;
+    int chunkLeftContextMs = 80;
+    int chunkRightContextMs = 560;
+    int spkCacheUpdatePeriod = 144;
   };
 
   ParakeetStreamingProcessor(
diff --git a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetConfig.hpp b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetConfig.hpp
index 744490632c..b05d3e4672 100644
--- a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetConfig.hpp
+++ b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetConfig.hpp
@@ -57,6 +57,27 @@ struct ParakeetConfig {
   int  streamingLeftContextMs    = -1;
   int  streamingRightLookaheadMs = -1;
 
+  // === AOSC (Audio-Online Speaker Cache; v2.1+ Sortformer only) ───────────
+  // Forwarded to parakeet::SortformerStreamingOptions.spkcache_* /
+  // fifo_len / chunk_{left,right}_context_ms / spkcache_update_period.
+  // Ignored on non-Sortformer models and on v1/v2 Sortformer GGUFs;
+  // parakeet-cpp auto-enables AOSC for v2.1 via the GGUF metadata tag
+  // `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"`.
+  //
+  // The cache anchors speaker-slot identity across silence and re-entry,
+  // fixing the per-chunk permutation-invariance drift that v1's sliding
+  // window suffers from. Defaults mirror parakeet-cpp's own (NeMo-port
+  // tuning); override only when A/B comparing or for specialised audio.
+  //
+  // Setting streamingSpkCacheEnable = false on a v2.1 model forces the
+  // v1 sliding-window code path (useful for regression comparison).
+  bool streamingSpkCacheEnable = true;
+  int streamingSpkCacheLen = 188;          // long-term speaker rows (~15s)
+  int streamingFifoLen = 188;              // FIFO warmup buffer rows
+  int streamingChunkLeftContextMs = 80;    // encoder left context  (~1 frame)
+  int streamingChunkRightContextMs = 560;  // encoder right context (~7 frames)
+  int streamingSpkCacheUpdatePeriod = 144; // FIFO-overflow pop-out count
+
   // ── Dynamic-backend loading ────────────────────────────────────────────
   // Forwarded to parakeet::EngineOptions::backends_dir /
   // opencl_cache_dir. On Android (and any other GGML_BACKEND_DL=ON
@@ -91,6 +112,13 @@ struct ParakeetConfig {
            streamingEnergyVad == other.streamingEnergyVad &&
            streamingLeftContextMs == other.streamingLeftContextMs &&
            streamingRightLookaheadMs == other.streamingRightLookaheadMs &&
+           streamingSpkCacheEnable == other.streamingSpkCacheEnable &&
+           streamingSpkCacheLen == other.streamingSpkCacheLen &&
+           streamingFifoLen == other.streamingFifoLen &&
+           streamingChunkLeftContextMs == other.streamingChunkLeftContextMs &&
+           streamingChunkRightContextMs == other.streamingChunkRightContextMs &&
+           streamingSpkCacheUpdatePeriod ==
+               other.streamingSpkCacheUpdatePeriod &&
            backendsDir == other.backendsDir &&
            openclCacheDir == other.openclCacheDir;
   }
diff --git a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.cpp b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.cpp
index e1b4e4acfb..adf4a736a6 100644
--- a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.cpp
+++ b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.cpp
@@ -720,6 +720,15 @@ void ParakeetModel::openStreamingSession_() {
     opts.threshold      = diarConfig_.onset;
     opts.min_segment_ms = static_cast<int>(diarConfig_.minDurationOn * 1000.0f);
     opts.emit_partials  = cfg_.streamingEmitPartials;
+    // AOSC (v2.1+ Sortformer only; ignored for v1/v2 GGUFs). The engine
+    // detects v2.1 via the GGUF metadata tag `parakeet.model_variant` and
+    // only consults these fields then -- safe to forward unconditionally.
+    opts.spkcache_enable = cfg_.streamingSpkCacheEnable;
+    opts.spkcache_len = cfg_.streamingSpkCacheLen;
+    opts.fifo_len = cfg_.streamingFifoLen;
+    opts.chunk_left_context_ms = cfg_.streamingChunkLeftContextMs;
+    opts.chunk_right_context_ms = cfg_.streamingChunkRightContextMs;
+    opts.spkcache_update_period = cfg_.streamingSpkCacheUpdatePeriod;
 
     auto session = engine->diarize_start(
         opts, [this](const parakeet::StreamingDiarizationSegment& seg) {
diff --git a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.hpp b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.hpp
index 2cd7c5f993..5e94cb2b5c 100644
--- a/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.hpp
+++ b/packages/transcription-parakeet/addon/src/model-interface/parakeet/ParakeetModel.hpp
@@ -139,6 +139,23 @@ class ParakeetModel : public qvac_lib_inference_addon_cpp::model::IModel,
   bool                getStreamingEnergyVad() const {
     return cfg_.streamingEnergyVad;
   }
+  // AOSC accessors (v2.1+ Sortformer only). Forwarded verbatim from
+  // ParakeetConfig; parakeet-cpp ignores them for non-Sortformer engines
+  // and for v1/v2 Sortformer GGUFs.
+  bool getStreamingSpkCacheEnable() const {
+    return cfg_.streamingSpkCacheEnable;
+  }
+  int getStreamingSpkCacheLen() const { return cfg_.streamingSpkCacheLen; }
+  int getStreamingFifoLen() const { return cfg_.streamingFifoLen; }
+  int getStreamingChunkLeftContextMs() const {
+    return cfg_.streamingChunkLeftContextMs;
+  }
+  int getStreamingChunkRightContextMs() const {
+    return cfg_.streamingChunkRightContextMs;
+  }
+  int getStreamingSpkCacheUpdatePeriod() const {
+    return cfg_.streamingSpkCacheUpdatePeriod;
+  }
   bool                isSortformer() const {
     return cfg_.modelType == ModelType::SORTFORMER;
   }
diff --git a/packages/transcription-parakeet/examples/diarized-transcribe.js b/packages/transcription-parakeet/examples/diarized-transcribe.js
index 424092d470..8f25c8f131 100644
--- a/packages/transcription-parakeet/examples/diarized-transcribe.js
+++ b/packages/transcription-parakeet/examples/diarized-transcribe.js
@@ -1,13 +1,21 @@
 'use strict'
 
 /**
- * Combined ASR + diarization example.
+ * Combined ASR + diarization example (offline).
  *
  * Runs Sortformer to find speaker time-segments, then transcribes
  * each speaker's audio slice with the ASR model. Output is a
  * "Speaker N: ..." per-segment transcript. Both engines run
  * through the public `TranscriptionParakeet` class.
  *
+ * Recommended `--diar-model`: the v1 Sortformer GGUF
+ * (`sortformer-4spk-v1.q8_0.gguf`). v2.1 also works but the AOSC
+ * speaker cache it brings is a *streaming* optimisation -- in batch /
+ * offline mode the entire clip is available at once, so AOSC's slot
+ * stability across silence/re-entry provides no additional benefit
+ * over v1. For live capture, use `examples/live-mic-diarized.js`
+ * (or `examples/live-mic-diarized-aosc.js`) with the v2.1 GGUF.
+ *
  * Usage:
  *   bare examples/diarized-transcribe.js \
  *        --asr-model <gguf> --diar-model <gguf> --audio <file>
diff --git a/packages/transcription-parakeet/examples/live-mic-diarized-aosc.js b/packages/transcription-parakeet/examples/live-mic-diarized-aosc.js
new file mode 100644
index 0000000000..41970a3026
--- /dev/null
+++ b/packages/transcription-parakeet/examples/live-mic-diarized-aosc.js
@@ -0,0 +1,369 @@
+'use strict'
+
+/**
+ * Live-mic transcription + diarization example with full AOSC control.
+ *
+ * This is the v2.1-focused counterpart of `examples/live-mic-diarized.js`.
+ * Both files share the same duplex pattern (two `runStreaming()`
+ * sessions fanned from a single sox capture, with the ASR transcript
+ * tagged by the latest Sortformer speaker_id). What this file adds is
+ * explicit CLI control of the AOSC (Audio-Online Speaker Cache) knobs
+ * parakeet-cpp exposes for v2.1 Sortformer streaming:
+ *
+ *   --spk-cache-enable {true|false}     Toggle AOSC. Defaults to true.
+ *                                       Set false to force a v2.1 GGUF
+ *                                       onto the v1 sliding-window
+ *                                       path (A/B comparison).
+ *   --spk-cache-len <rows>              Long-term speaker-cache rows
+ *                                       (default 188 ≈ 15 s).
+ *   --fifo-len <rows>                   FIFO warmup buffer rows
+ *                                       (default 188).
+ *   --chunk-left-context-ms <ms>        Encoder left context, ~1 frame
+ *                                       (default 80).
+ *   --chunk-right-context-ms <ms>       Encoder right context, ~7 frames
+ *                                       (default 560). Adds directly to
+ *                                       per-chunk emission latency.
+ *   --spk-cache-update-period <count>   FIFO-overflow pop-out count
+ *                                       (default 144). How many frames
+ *                                       get promoted into the long-term
+ *                                       cache each time the FIFO fills.
+ *
+ * Background -- what AOSC fixes:
+ * v1 / v2 Sortformer streams use a fixed-size sliding-history window
+ * inside the engine. Once two voices have been seen, the model's
+ * per-chunk decisions are permutation-invariant; if one speaker goes
+ * silent long enough to roll out of the window, its slot identity can
+ * silently drift onto a different physical voice when it returns. v2.1
+ * replaces the sliding window with a NeMo-port speaker cache that
+ * anchors each slot to its accumulated embedding, so the same physical
+ * speaker comes back to the same `Speaker N` tag across silences.
+ *
+ * For the upstream API + algorithm details, see
+ * `parakeet-cpp/include/parakeet/diarization.h` and the upstream PRs
+ * that introduced this feature in qvac-ext-lib-whisper.cpp (PR #22
+ * commit e6ba38c, PR #24 commit 08df2e7).
+ *
+ * Usage:
+ *   bare examples/live-mic-diarized-aosc.js \
+ *        --asr-model <ctc-or-tdt-gguf> \
+ *        --diar-model <v2.1-sortformer-gguf> \
+ *        [--accumulate] [--chunk-ms <ms>] [--capture "<sox cmd>"] \
+ *        [--spk-cache-enable {true|false}] [--spk-cache-len <rows>] \
+ *        [--fifo-len <rows>] [--chunk-left-context-ms <ms>] \
+ *        [--chunk-right-context-ms <ms>] [--spk-cache-update-period <count>]
+ *
+ * Notes:
+ *  - The AOSC knobs are silently ignored on v1/v2 GGUFs and on
+ *    non-Sortformer models. The engine detects v2.1 via the GGUF
+ *    metadata tag `parakeet.model_variant`.
+ *  - On Windows, if sox exits without producing audio, override capture:
+ *      --capture "sox -t waveaudio default -t raw -r 16000 -b 16 -c 1 -e signed-integer -L -"
+ */
+
+/* global Bare */
+const path = require('bare-path')
+const process = require('bare-process')
+const subprocess = require('bare-subprocess')
+const TranscriptionParakeet = require('../index.js')
+const addonLogging = require('../addonLogging.js')
+const { setupLogger, validatePaths, pushableStream } = require('./utils.js')
+
+const CAPTURE_CMD = 'sox -d -t raw -r 16000 -b 16 -c 1 -e signed-integer -L -'
+
+const SILENCE_SENTINELS = new Set([
+  '[No speech detected]',
+  '[Audio too short]',
+  '[Model not ready]',
+  '[No speakers detected]'
+])
+
+function isSilenceText (text) {
+  return text.length === 0 || SILENCE_SENTINELS.has(text)
+}
+
+function buildSegmentText (items) {
+  let text = ''
+  let firstStartsWord = true
+  let isFirst = true
+  for (const s of items) {
+    if (!s || !s.text || !s.toAppend) continue
+    const sw = s.startsWord !== false
+    if (isFirst) {
+      firstStartsWord = sw
+      text = s.text
+      isFirst = false
+    } else {
+      text += (sw ? ' ' : '') + s.text
+    }
+  }
+  return { text: text.replace(/\s+/g, ' '), firstStartsWord }
+}
+
+function parseSortformerSpeakerId (text) {
+  const m = typeof text === 'string'
+    ? text.match(/Speaker\s+(\d+)/)
+    : null
+  return m ? parseInt(m[1], 10) : -1
+}
+
+function parseBoolFlag (value) {
+  if (value === undefined || value === null) return undefined
+  const normalised = String(value).toLowerCase()
+  if (normalised === 'true' || normalised === '1' || normalised === 'yes') return true
+  if (normalised === 'false' || normalised === '0' || normalised === 'no') return false
+  return undefined
+}
+
+function parsePositiveInt (value) {
+  const n = parseInt(value, 10)
+  return Number.isFinite(n) && n > 0 ? n : null
+}
+
+function parseNonNegativeInt (value) {
+  const n = parseInt(value, 10)
+  return Number.isFinite(n) && n >= 0 ? n : null
+}
+
+function parseArgs () {
+  const args = {
+    asrModel: null,
+    diarModel: null,
+    accumulate: false,
+    capture: null,
+    chunkMs: null,
+    spkCacheEnable: undefined,
+    spkCacheLen: null,
+    fifoLen: null,
+    chunkLeftContextMs: null,
+    chunkRightContextMs: null,
+    spkCacheUpdatePeriod: null
+  }
+  const argv = Bare.argv.slice(2)
+  for (let i = 0; i < argv.length; i++) {
+    const a = argv[i]
+    if (a === '--asr-model' || a === '-m') args.asrModel = argv[++i]
+    else if (a === '--diar-model' || a === '-d') args.diarModel = argv[++i]
+    else if (a === '--accumulate') args.accumulate = true
+    else if (a === '--capture' || a === '-c') args.capture = argv[++i]
+    else if (a === '--chunk-ms') {
+      const v = parsePositiveInt(argv[++i])
+      if (v !== null && v >= 200) args.chunkMs = v
+    } else if (a === '--spk-cache-enable') {
+      const v = parseBoolFlag(argv[++i])
+      if (v !== undefined) args.spkCacheEnable = v
+    } else if (a === '--spk-cache-len') args.spkCacheLen = parsePositiveInt(argv[++i])
+    else if (a === '--fifo-len') args.fifoLen = parsePositiveInt(argv[++i])
+    else if (a === '--chunk-left-context-ms') args.chunkLeftContextMs = parseNonNegativeInt(argv[++i])
+    else if (a === '--chunk-right-context-ms') args.chunkRightContextMs = parseNonNegativeInt(argv[++i])
+    else if (a === '--spk-cache-update-period') args.spkCacheUpdatePeriod = parsePositiveInt(argv[++i])
+  }
+  return args
+}
+
+function buildDiarConfig (args) {
+  const config = {
+    streaming: true,
+    streamingChunkMs: args.chunkMs ?? 2000,
+    useGPU: true
+  }
+  if (args.spkCacheEnable !== undefined) config.streamingSpkCacheEnable = args.spkCacheEnable
+  if (args.spkCacheLen !== null) config.streamingSpkCacheLen = args.spkCacheLen
+  if (args.fifoLen !== null) config.streamingFifoLen = args.fifoLen
+  if (args.chunkLeftContextMs !== null) config.streamingChunkLeftContextMs = args.chunkLeftContextMs
+  if (args.chunkRightContextMs !== null) config.streamingChunkRightContextMs = args.chunkRightContextMs
+  if (args.spkCacheUpdatePeriod !== null) config.streamingSpkCacheUpdatePeriod = args.spkCacheUpdatePeriod
+  return config
+}
+
+function describeAoscConfig (config) {
+  const parts = []
+  if ('streamingSpkCacheEnable' in config) parts.push(`spkCacheEnable=${config.streamingSpkCacheEnable}`)
+  if ('streamingSpkCacheLen' in config) parts.push(`spkCacheLen=${config.streamingSpkCacheLen}`)
+  if ('streamingFifoLen' in config) parts.push(`fifoLen=${config.streamingFifoLen}`)
+  if ('streamingChunkLeftContextMs' in config) parts.push(`chunkLeftContextMs=${config.streamingChunkLeftContextMs}`)
+  if ('streamingChunkRightContextMs' in config) parts.push(`chunkRightContextMs=${config.streamingChunkRightContextMs}`)
+  if ('streamingSpkCacheUpdatePeriod' in config) parts.push(`spkCacheUpdatePeriod=${config.streamingSpkCacheUpdatePeriod}`)
+  return parts.length === 0 ? '(all AOSC defaults)' : parts.join(' ')
+}
+
+async function main () {
+  const args = parseArgs()
+  if (!args.asrModel || !args.diarModel) {
+    console.error('Usage: bare examples/live-mic-diarized-aosc.js --asr-model <gguf> --diar-model <v2.1-gguf> [--accumulate] [--chunk-ms <ms>] [--capture "<sox cmd>"] [--spk-cache-enable {true|false}] [--spk-cache-len <rows>] [--fifo-len <rows>] [--chunk-left-context-ms <ms>] [--chunk-right-context-ms <ms>] [--spk-cache-update-period <count>]')
+    process.exit(1)
+  }
+
+  setupLogger(addonLogging)
+  let stopping = false
+
+  const asrPath = path.resolve(args.asrModel)
+  const diarPath = path.resolve(args.diarModel)
+  if (!validatePaths({ model: asrPath })) { addonLogging.releaseLogger(); process.exit(1) }
+  if (!validatePaths({ model: diarPath })) { addonLogging.releaseLogger(); process.exit(1) }
+
+  console.log(`Loading ASR: ${asrPath}`)
+  console.log(`Loading DIAR: ${diarPath}`)
+
+  const diarConfig = buildDiarConfig(args)
+  console.log(`AOSC config: ${describeAoscConfig(diarConfig)}`)
+
+  const asr = new TranscriptionParakeet({
+    files: { model: asrPath },
+    config: {
+      parakeetConfig: {
+        streaming: true,
+        streamingChunkMs: args.chunkMs ?? 2000,
+        useGPU: true
+      }
+    }
+  })
+  const diar = new TranscriptionParakeet({
+    files: { model: diarPath },
+    config: { parakeetConfig: diarConfig }
+  })
+
+  await asr.load()
+  await diar.load()
+  console.log('Listening (Ctrl-C to stop)...\n')
+
+  const captureCmd = args.capture && args.capture.length > 0 ? args.capture : CAPTURE_CMD
+  const [captureBin, ...captureArgs] = captureCmd.split(' ')
+  let child
+  try {
+    child = subprocess.spawn(captureBin, captureArgs,
+      { stdio: ['ignore', 'pipe', 'pipe'] })
+  } catch (err) {
+    if (err && err.code === 'ENOENT') {
+      console.error(`\n'${captureBin}' not found on PATH.`)
+      console.error('Install sox (brew install sox / apt install sox / choco install sox / winget install ChrisBagwell.SoX).')
+    } else {
+      console.error(`\nFailed to spawn capture command: ${err.message}`)
+    }
+    addonLogging.releaseLogger()
+    process.exit(1)
+  }
+  child.on('error', (err) => {
+    console.error(`\nCapture command failed: ${err.message}`)
+    process.exit(1)
+  })
+
+  let firstAudioSeen = false
+  let stderrBuf = ''
+  child.stderr.on('data', (chunk) => {
+    stderrBuf += chunk.toString('utf8')
+    if (stderrBuf.length > 8192) stderrBuf = stderrBuf.slice(-8192)
+  })
+
+  let lineOpen = false
+  let lineSpeaker = null
+  let lastSpeaker = -1
+
+  function flushLine () {
+    if (lineOpen) {
+      process.stdout.write('\n')
+      lineOpen = false
+      lineSpeaker = null
+    }
+  }
+  function emitTranscript (speaker, text, firstStartsWord) {
+    if (isSilenceText(text)) {
+      if (args.accumulate) flushLine()
+      return
+    }
+    const tag = speaker >= 0 ? `speaker_${speaker}` : 'speaker_?'
+    const ts = new Date().toISOString().slice(11, 19)
+    if (args.accumulate) {
+      if (lineOpen && lineSpeaker !== speaker) flushLine()
+      if (!lineOpen) {
+        process.stdout.write(`[${ts}] ${tag}: ${text}`)
+        lineOpen = true
+        lineSpeaker = speaker
+      } else {
+        process.stdout.write((firstStartsWord ? ' ' : '') + text)
+      }
+    } else {
+      console.log(`[${ts}] ${tag}: ${text}`)
+    }
+  }
+
+  const asrStream = pushableStream()
+  const diarStream = pushableStream()
+  child.stdout.on('data', (chunk) => {
+    if (!firstAudioSeen) firstAudioSeen = true
+    if (stopping) return
+    asrStream.push(chunk)
+    diarStream.push(chunk)
+  })
+
+  const streamingConfig = {}
+  if (args.chunkMs !== null) streamingConfig.chunkMs = args.chunkMs
+
+  const diarRunPromise = (async () => {
+    const response = await diar.runStreaming(diarStream, streamingConfig)
+    await response
+      .onUpdate(out => {
+        const items = Array.isArray(out) ? out : [out]
+        for (let i = items.length - 1; i >= 0; i--) {
+          const s = items[i]
+          if (!s || !s.text || isSilenceText(s.text)) continue
+          const id = parseSortformerSpeakerId(s.text)
+          if (id >= 0) {
+            lastSpeaker = id
+            break
+          }
+        }
+      })
+      .await()
+  })()
+
+  const asrRunPromise = (async () => {
+    const response = await asr.runStreaming(asrStream, streamingConfig)
+    await response
+      .onUpdate(out => {
+        const items = Array.isArray(out) ? out : [out]
+        const { text, firstStartsWord } = buildSegmentText(items)
+        emitTranscript(lastSpeaker, text.trim(), firstStartsWord)
+      })
+      .await()
+  })()
+
+  async function shutdown () {
+    if (stopping) return
+    stopping = true
+    console.log('\nStopping...')
+    try { child.kill('SIGTERM') } catch (e) { /* ignore */ }
+    asrStream.end()
+    diarStream.end()
+    try { await Promise.all([asrRunPromise, diarRunPromise]) } catch (e) { /* swallow */ }
+    flushLine()
+    try { await asr.unload() } catch (e) { /* ignore */ }
+    try { await diar.unload() } catch (e) { /* ignore */ }
+    addonLogging.releaseLogger()
+    process.exit(0)
+  }
+
+  process.once('SIGINT', shutdown)
+  process.once('SIGTERM', shutdown)
+  child.on('exit', (code, signal) => {
+    if (!firstAudioSeen && !stopping) {
+      console.error(`\nCapture command exited before producing audio (code=${code}, signal=${signal}).`)
+      const tail = stderrBuf.trim()
+      if (tail) {
+        console.error('--- sox stderr ---')
+        console.error(tail)
+        console.error('------------------')
+      }
+      console.error('Hints:')
+      console.error('  - On Windows, try: --capture "sox -t waveaudio default -t raw -r 16000 -b 16 -c 1 -e signed-integer -L -"')
+      console.error('  - Verify a default recording device exists (Settings -> System -> Sound -> Input).')
+      console.error('  - Confirm SoX can list audio devices: sox -V6 -d -t raw -r 16000 -c 1 -e signed-integer -b 16 -L - 2>&1 | head')
+    }
+    shutdown()
+  })
+}
+
+main().catch(err => {
+  console.error('Error:', err)
+  addonLogging.releaseLogger()
+  process.exit(1)
+})
diff --git a/packages/transcription-parakeet/examples/live-mic-diarized.js b/packages/transcription-parakeet/examples/live-mic-diarized.js
index 47808ea2f4..ada13e5b32 100644
--- a/packages/transcription-parakeet/examples/live-mic-diarized.js
+++ b/packages/transcription-parakeet/examples/live-mic-diarized.js
@@ -10,16 +10,27 @@
  * Sortformer segment; the ASR side tags each printed transcript with
  * `lastSpeaker`. Press Ctrl-C to flush and exit.
  *
- * Diarization tagging is best-effort. Sortformer's streaming session
- * is permutation-invariant per chunk and prone to occasional
- * speaker-ID drift on continuous single-speaker stretches once two
- * voices have been seen in the rolling-history window. parakeet-cpp
- * documents this behaviour in
+ * Recommended `--diar-model`: the v2.1 Sortformer GGUF
+ * (`diar_streaming_sortformer_4spk-v2.1.q8_0.gguf`). parakeet-cpp
+ * detects v2.1 from the GGUF metadata tag
+ * `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"` and
+ * enables AOSC (Audio-Online Speaker Cache) automatically, which
+ * anchors speaker slots across silence and re-entry and largely
+ * removes the drift caveat described below.
+ *
+ * For an AOSC-aware variant that also exposes the speaker-cache
+ * tuning knobs from the CLI, see `examples/live-mic-diarized-aosc.js`.
+ *
+ * v1 caveat (kept for users running the older v1 GGUF): Sortformer's
+ * streaming session is permutation-invariant per chunk and prone to
+ * occasional speaker-ID drift on continuous single-speaker stretches
+ * once two voices have been seen in the rolling-history window.
+ * parakeet-cpp documents this behaviour in
  * `parakeet-cpp/include/parakeet/diarization.h:80-82`. Fixing it
- * properly requires per-segment voice embeddings (currently not
- * exposed by the engine) -- this example therefore renders the raw
- * Sortformer ID and accepts the occasional mis-tag rather than try
- * to second-guess the model in JS.
+ * properly required per-segment voice embeddings (now solved by v2.1's
+ * AOSC) -- this example therefore renders the raw Sortformer ID and
+ * accepts the occasional mis-tag rather than try to second-guess the
+ * model in JS.
  *
  * Usage:
  *   bare examples/live-mic-diarized.js \
@@ -98,9 +109,14 @@ function parseArgs () {
 }
 
 // Pin the Sortformer rolling-history window at parakeet-cpp's default
-// (30 s). Pushing past it puts the input outside the window the
-// underlying model was trained on, which empirically causes the engine
-// to collapse all voices onto sortformer_0.
+// (30 s). Pushing past it on a v1 GGUF puts the input outside the
+// window the underlying model was trained on, which empirically causes
+// the engine to collapse all voices onto sortformer_0.
+//
+// On a v2.1 GGUF, AOSC is auto-enabled and supersedes this rolling
+// window with a NeMo-port speaker cache. parakeet-cpp ignores
+// `history_ms` for v2.1 sessions, so this constant is harmless either
+// way and is kept for backwards compatibility with v1 GGUFs.
 const STREAMING_HISTORY_MS = 30000
 
 // Pull the Sortformer speaker_id out of the addon's segment text
diff --git a/packages/transcription-parakeet/index.d.ts b/packages/transcription-parakeet/index.d.ts
index c99897efdc..cf8d38f5ee 100644
--- a/packages/transcription-parakeet/index.d.ts
+++ b/packages/transcription-parakeet/index.d.ts
@@ -71,6 +71,30 @@ declare interface ParakeetConfig {
    * (2000 ms). ASR sessions only.
    */
   streamingRightLookaheadMs?: number
+
+  /**
+   * AOSC (Audio-Online Speaker Cache): enable v2.1 Sortformer's
+   * speaker-cache streaming. Ignored on v1/v2 Sortformer GGUFs and on
+   * non-Sortformer models. Set false to force a v2.1 model onto the
+   * v1 sliding-window path (e.g. for A/B comparison). Default: true.
+   *
+   * The cache anchors each speaker to a stable slot across silence and
+   * re-entry, fixing the per-chunk permutation-invariance drift that v1
+   * suffers from when two voices have been seen in the rolling window.
+   * v2.1 is auto-detected from the GGUF metadata tag
+   * `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"`.
+   */
+  streamingSpkCacheEnable?: boolean
+  /** AOSC: long-term speaker-cache rows (~15 s of encoder frames). Default: 188. */
+  streamingSpkCacheLen?: number
+  /** AOSC: FIFO warmup buffer rows. Default: 188. */
+  streamingFifoLen?: number
+  /** AOSC: encoder left-context window (ms; ~1 encoder frame). Default: 80. */
+  streamingChunkLeftContextMs?: number
+  /** AOSC: encoder right-context window (ms; ~7 encoder frames). Default: 560. */
+  streamingChunkRightContextMs?: number
+  /** AOSC: FIFO-overflow pop-out count. Default: 144. */
+  streamingSpkCacheUpdatePeriod?: number
   /**
    * Directory the native addon scans for dynamically-loaded ggml
    * backend libraries (`libqvac-speech-ggml-vulkan.so`,
@@ -196,6 +220,18 @@ declare interface StreamingRunConfig {
   emitPartials?: boolean
   /** CTC/TDT-only energy-VAD events. */
   emitEnergyVad?: boolean
+  /** AOSC: enable/disable v2.1 speaker cache (overrides `streamingSpkCacheEnable`). */
+  spkCacheEnable?: boolean
+  /** AOSC: long-term speaker-cache rows (overrides `streamingSpkCacheLen`). */
+  spkCacheLen?: number
+  /** AOSC: FIFO warmup buffer rows (overrides `streamingFifoLen`). */
+  fifoLen?: number
+  /** AOSC: encoder left-context window in ms (overrides `streamingChunkLeftContextMs`). */
+  chunkLeftContextMs?: number
+  /** AOSC: encoder right-context window in ms (overrides `streamingChunkRightContextMs`). */
+  chunkRightContextMs?: number
+  /** AOSC: FIFO-overflow pop-out count (overrides `streamingSpkCacheUpdatePeriod`). */
+  spkCacheUpdatePeriod?: number
 }
 
 /**
diff --git a/packages/transcription-parakeet/index.js b/packages/transcription-parakeet/index.js
index 9b65e806ab..f335647367 100644
--- a/packages/transcription-parakeet/index.js
+++ b/packages/transcription-parakeet/index.js
@@ -126,6 +126,18 @@ class TranscriptionParakeet {
       streamingEnergyVad: this.params.streamingEnergyVad === true,
       streamingLeftContextMs: this.params.streamingLeftContextMs ?? -1,
       streamingRightLookaheadMs: this.params.streamingRightLookaheadMs ?? -1,
+      // AOSC (v2.1+ Sortformer only). parakeet-cpp ignores these on
+      // non-Sortformer engines and on v1/v2 GGUFs. Defaults mirror the
+      // C++ ParakeetConfig defaults; passing the field explicitly (vs
+      // letting C++ pick its own default) ensures user overrides at
+      // the JS layer reach the native engine instead of being silently
+      // discarded by _buildConfigurationParams.
+      streamingSpkCacheEnable: this.params.streamingSpkCacheEnable !== false,
+      streamingSpkCacheLen: this.params.streamingSpkCacheLen ?? 188,
+      streamingFifoLen: this.params.streamingFifoLen ?? 188,
+      streamingChunkLeftContextMs: this.params.streamingChunkLeftContextMs ?? 80,
+      streamingChunkRightContextMs: this.params.streamingChunkRightContextMs ?? 560,
+      streamingSpkCacheUpdatePeriod: this.params.streamingSpkCacheUpdatePeriod ?? 144,
       // Forwarded as-is; ParakeetInterface fills in a per-package
       // default for `backendsDir` (`path.join(__dirname, 'prebuilds')`)
       // when the host doesn't pass one, so explicit `undefined`
diff --git a/packages/transcription-parakeet/package.json b/packages/transcription-parakeet/package.json
index 59fb9fc9d1..04f8c3d728 100644
--- a/packages/transcription-parakeet/package.json
+++ b/packages/transcription-parakeet/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@qvac/transcription-parakeet",
-  "version": "0.4.0",
+  "version": "0.6.0",
   "description": "High-performance speech-to-text inference addon using NVIDIA Parakeet models for Bare runtime",
   "addon": true,
   "engines": {
diff --git a/packages/transcription-parakeet/parakeet.js b/packages/transcription-parakeet/parakeet.js
index bcd2dcb04d..541d5f9055 100644
--- a/packages/transcription-parakeet/parakeet.js
+++ b/packages/transcription-parakeet/parakeet.js
@@ -59,6 +59,20 @@ class ParakeetInterface {
    *   left context (parakeet default 10000 ms; -1 keeps the engine default).
    * @param {number} [configurationParams.streamingRightLookaheadMs] - ASR encoder
    *   right lookahead (parakeet default 2000 ms; -1 keeps the engine default).
+   * @param {boolean} [configurationParams.streamingSpkCacheEnable=true] - AOSC:
+   *   enable v2.1 Sortformer speaker-cache streaming. Ignored on v1/v2 GGUFs
+   *   and on non-Sortformer models. Set false to force the v1 sliding-window
+   *   path on a v2.1 model (A/B comparison).
+   * @param {number} [configurationParams.streamingSpkCacheLen=188] - AOSC:
+   *   long-term speaker-cache rows (~15 s of encoder frames).
+   * @param {number} [configurationParams.streamingFifoLen=188] - AOSC: FIFO
+   *   warmup buffer rows.
+   * @param {number} [configurationParams.streamingChunkLeftContextMs=80] -
+   *   AOSC: encoder left-context window (ms; ~1 encoder frame).
+   * @param {number} [configurationParams.streamingChunkRightContextMs=560] -
+   *   AOSC: encoder right-context window (ms; ~7 encoder frames).
+   * @param {number} [configurationParams.streamingSpkCacheUpdatePeriod=144] -
+   *   AOSC: FIFO-overflow pop-out count.
    * @param {string} [configurationParams.backendsDir] - root directory
    *   for dynamically-loaded ggml backends. JS defaults to
    *   `<package_dir>/prebuilds`; the native addon appends
@@ -494,6 +508,12 @@ class ParakeetInterface {
    * @param {number} [config.rightLookaheadMs] - ASR encoder right lookahead (overrides cfg.streamingRightLookaheadMs)
    * @param {boolean} [config.emitPartials] - emit partial segments on chunk boundaries
    * @param {boolean} [config.emitEnergyVad] - surface energy-VAD events for CTC/TDT
+   * @param {boolean} [config.spkCacheEnable] - AOSC: enable/disable v2.1 speaker cache (overrides cfg.streamingSpkCacheEnable)
+   * @param {number} [config.spkCacheLen] - AOSC: long-term speaker-cache rows (overrides cfg.streamingSpkCacheLen)
+   * @param {number} [config.fifoLen] - AOSC: FIFO warmup buffer rows (overrides cfg.streamingFifoLen)
+   * @param {number} [config.chunkLeftContextMs] - AOSC: encoder left-context window in ms (overrides cfg.streamingChunkLeftContextMs)
+   * @param {number} [config.chunkRightContextMs] - AOSC: encoder right-context window in ms (overrides cfg.streamingChunkRightContextMs)
+   * @param {number} [config.spkCacheUpdatePeriod] - AOSC: FIFO-overflow pop-out count (overrides cfg.streamingSpkCacheUpdatePeriod)
    * @returns {Promise<number>} jobId assigned to the streaming session
    */
   async startStreaming (config = {}) {
diff --git a/packages/transcription-parakeet/scripts/convert-nemo-to-gguf.py b/packages/transcription-parakeet/scripts/convert-nemo-to-gguf.py
index c707ee1788..e18d9f8467 100644
--- a/packages/transcription-parakeet/scripts/convert-nemo-to-gguf.py
+++ b/packages/transcription-parakeet/scripts/convert-nemo-to-gguf.py
@@ -217,7 +217,24 @@ def fuse_bn(weight, bias, running_mean, running_var, eps=1e-5):
     return scale.astype(np.float32), shift.astype(np.float32)
 
 
-def write_gguf(out: Path, cfg: dict, sd: dict, tok_bytes: bytes, quant: str):
+def detect_sortformer_variant(ckpt: Path) -> str:
+    """
+    Map a NeMo Sortformer .nemo filename to a stable variant tag the C++
+    loader can match against. The tag is the only thing that distinguishes
+    cache-aware v2.1 from architecturally-identical v1 / v2 at GGUF time
+    (encoder shape alone is ambiguous against future variants).
+    """
+    stem = ckpt.stem
+    if "streaming_sortformer" in stem and "-v2.1" in stem:
+        return "sortformer-streaming-v2.1-aosc"
+    if "streaming_sortformer" in stem and "-v2" in stem:
+        return "sortformer-streaming-v2"
+    if "diar_sortformer" in stem and "-v1" in stem:
+        return "sortformer-v1"
+    return ""
+
+
+def write_gguf(out: Path, ckpt: Path, cfg: dict, sd: dict, tok_bytes: bytes, quant: str):
     model_type = detect_model_type(cfg)
 
     enc = cfg["encoder"]
@@ -349,6 +366,12 @@ def write_gguf(out: Path, cfg: dict, sd: dict, tok_bytes: bytes, quant: str):
         writer.add_uint32("parakeet.sortformer.tf_n_heads",      int(tfe["num_attention_heads"]))
         writer.add_bool  ("parakeet.sortformer.tf_pre_ln",       bool(tfe.get("pre_ln", False)))
         writer.add_string("parakeet.sortformer.tf_hidden_act",   str(tfe.get("hidden_act", "relu")))
+        # Variant tag (preferred over shape-based detection on the C++ side).
+        # Empty string = unknown checkpoint; loader falls back to encoder
+        # shape so older GGUFs continue to load.
+        variant = detect_sortformer_variant(ckpt)
+        if variant:
+            writer.add_string("parakeet.model_variant", variant)
     else:
         pred_hidden      = int(dec["prednet"]["pred_hidden"])
         pred_rnn_layers  = int(dec["prednet"]["pred_rnn_layers"])
@@ -628,7 +651,7 @@ def main():
     ckpt = ensure_ckpt(args.ckpt, args.hf_repo)
     cfg, sd, tok_bytes = load_nemo(ckpt)
     args.out.parent.mkdir(parents=True, exist_ok=True)
-    write_gguf(args.out, cfg, sd, tok_bytes, args.quant)
+    write_gguf(args.out, ckpt, cfg, sd, tok_bytes, args.quant)
 
 
 if __name__ == "__main__":
diff --git a/packages/transcription-parakeet/scripts/convert-nemo.sh b/packages/transcription-parakeet/scripts/convert-nemo.sh
index cd7be608bd..33de47fb53 100644
--- a/packages/transcription-parakeet/scripts/convert-nemo.sh
+++ b/packages/transcription-parakeet/scripts/convert-nemo.sh
@@ -17,7 +17,8 @@
 #   ./scripts/convert-nemo.sh [flags]
 #
 # Flags:
-#   --type, -t <ctc|tdt|eou|sortformer|all>     Which model(s) (default: all)
+#   --type, -t <ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all>
+#                                               Which model(s) (default: all)
 #   --quant, -q <f16|q8_0|q5_0|q4_0|f32>        Quant tier (default: q8_0)
 #   --python <bin>                              Python interpreter (default:
 #                                                $PYTHON, then ./venv/bin/python,
@@ -62,8 +63,8 @@ while [[ $# -gt 0 ]]; do
 done
 
 case "$TYPE" in
-  ctc|tdt|eou|sortformer|all) ;;
-  *) echo "Error: --type must be ctc|tdt|eou|sortformer|all" >&2; exit 2;;
+  ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all) ;;
+  *) echo "Error: --type must be ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all" >&2; exit 2;;
 esac
 case "$QUANT" in
   f32|f16|q8_0|q5_0|q4_0) ;;
@@ -128,6 +129,7 @@ nemo_filename() {
     tdt)        echo "parakeet-tdt-0.6b-v3.nemo";;
     eou)        echo "parakeet_realtime_eou_120m-v1.nemo";;
     sortformer) echo "diar_sortformer_4spk-v1.nemo";;
+    sortformer-streaming-v2.1) echo "diar_streaming_sortformer_4spk-v2.1.nemo";;
   esac
 }
 gguf_filename() {
@@ -137,6 +139,7 @@ gguf_filename() {
     tdt)        echo "parakeet-tdt-0.6b-v3.${q}.gguf";;
     eou)        echo "parakeet-eou-120m-v1.${q}.gguf";;
     sortformer) echo "sortformer-4spk-v1.${q}.gguf";;
+    sortformer-streaming-v2.1) echo "diar_streaming_sortformer_4spk-v2.1.${q}.gguf";;
   esac
 }
 
@@ -196,7 +199,7 @@ echo
 
 failures=0
 if [[ "$TYPE" == "all" ]]; then
-  for t in ctc tdt eou sortformer; do
+  for t in ctc tdt eou sortformer sortformer-streaming-v2.1; do
     convert_one "$t" || failures=$((failures + 1))
   done
 else
diff --git a/packages/transcription-parakeet/scripts/download-models.sh b/packages/transcription-parakeet/scripts/download-models.sh
index 5b2404117f..d9eeb00c05 100755
--- a/packages/transcription-parakeet/scripts/download-models.sh
+++ b/packages/transcription-parakeet/scripts/download-models.sh
@@ -12,7 +12,8 @@
 #   ./scripts/download-models.sh [flags]
 #
 # Flags:
-#   --type, -t <ctc|tdt|eou|sortformer|all>   Which model(s) (default: all)
+#   --type, -t <ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all>
+#                                             Which model(s) (default: all)
 #   --output, -o <path>                       Destination dir (default: ./models/nemo)
 #   --force, -f                               Re-download even if present
 #   --help, -h                                Show this help
@@ -43,8 +44,8 @@ while [[ $# -gt 0 ]]; do
 done
 
 case "$TYPE" in
-  ctc|tdt|eou|sortformer|all) ;;
-  *) echo "Error: --type must be ctc|tdt|eou|sortformer|all" >&2; exit 2;;
+  ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all) ;;
+  *) echo "Error: --type must be ctc|tdt|eou|sortformer|sortformer-streaming-v2.1|all" >&2; exit 2;;
 esac
 
 # Map model type -> { hf_repo, nemo_filename }
@@ -54,6 +55,7 @@ nemo_url() {
     tdt)        echo "https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3/resolve/main/parakeet-tdt-0.6b-v3.nemo";;
     eou)        echo "https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1/resolve/main/parakeet_realtime_eou_120m-v1.nemo";;
     sortformer) echo "https://huggingface.co/nvidia/diar_sortformer_4spk-v1/resolve/main/diar_sortformer_4spk-v1.nemo";;
+    sortformer-streaming-v2.1) echo "https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1/resolve/main/diar_streaming_sortformer_4spk-v2.1.nemo";;
   esac
 }
 nemo_filename() {
@@ -95,7 +97,7 @@ echo "Output: ${OUTPUT_DIR}"
 echo
 
 if [[ "$TYPE" == "all" ]]; then
-  for t in ctc tdt eou sortformer; do
+  for t in ctc tdt eou sortformer sortformer-streaming-v2.1; do
     fetch_nemo "$t"
   done
 else
diff --git a/packages/transcription-parakeet/test/integration/helpers.js b/packages/transcription-parakeet/test/integration/helpers.js
index 27960763bf..0d34f03245 100644
--- a/packages/transcription-parakeet/test/integration/helpers.js
+++ b/packages/transcription-parakeet/test/integration/helpers.js
@@ -802,6 +802,19 @@ const MODEL_CONFIGS = {
     mobileFile: 'sortformer-4spk-v1.q4_0.gguf',
     minSize: 50 * 1024 * 1024,
     url: null
+  },
+  // Streaming-default Sortformer (v2.1 + NeMo-port AOSC). The AOSC
+  // speaker cache anchors slot identity across silence and re-entry,
+  // fixing the per-chunk drift v1 shows when two voices have been seen
+  // in the rolling-history window. Auto-enabled by parakeet-cpp when the
+  // GGUF carries `parakeet.model_variant == "sortformer-streaming-v2.1-aosc"`.
+  // The GGUF needs to be staged (npm run setup-models / QVAC_TEST_GGUF_DIR)
+  // before sortformer-streaming tests can run; otherwise they skip.
+  sortformerStreaming: {
+    file: 'diar_streaming_sortformer_4spk-v2.1.q8_0.gguf',
+    mobileFile: 'diar_streaming_sortformer_4spk-v2.1.q4_0.gguf',
+    minSize: 50 * 1024 * 1024,
+    url: null
   }
 }
 
diff --git a/packages/transcription-parakeet/test/integration/sortformer-aosc-streaming.test.js b/packages/transcription-parakeet/test/integration/sortformer-aosc-streaming.test.js
new file mode 100644
index 0000000000..f3749349a9
--- /dev/null
+++ b/packages/transcription-parakeet/test/integration/sortformer-aosc-streaming.test.js
@@ -0,0 +1,222 @@
+'use strict'
+
+/**
+ * Sortformer v2.1 + AOSC streaming integration test.
+ *
+ * Verifies that:
+ *   1. The v2.1 Sortformer GGUF loads and the JS-side AOSC config
+ *      knobs flow through the native binding without errors.
+ *   2. A streaming diarization session with default AOSC config emits
+ *      well-formed speaker segments matching the
+ *      "Speaker N: HH:MM:SS.fff - HH:MM:SS.fff" pattern that the
+ *      offline diarization path also produces.
+ *   3. Forcing `streamingSpkCacheEnable: false` on the same v2.1 GGUF
+ *      falls back to the v1 sliding-window path cleanly (still emits
+ *      segments; just without the AOSC stability guarantees).
+ *
+ * The full AOSC slot-stability contract (same speaker -> same hyp_<id>
+ * across non-contiguous re-entries) is verified at C++ level by
+ * `parakeet-cpp/test/test_sortformer_aosc_speakers.cpp` using the
+ * `abcba.wav` / `abcdba.wav` fixtures. This JS-level test focuses on
+ * wiring correctness; if it passes, the AOSC knobs are reaching the
+ * engine and parakeet-cpp's own regression tests cover the runtime
+ * behaviour.
+ *
+ * Skips cleanly when the v2.1 GGUF is missing
+ * (`MODEL_CONFIGS.sortformerStreaming`); the file isn't bundled with
+ * the repo -- stage it via `npm run setup-models` or by pointing
+ * `QVAC_TEST_GGUF_DIR` at a directory containing
+ * `diar_streaming_sortformer_4spk-v2.1.q8_0.gguf`.
+ */
+
+const test = require('brittle')
+const fs = require('bare-fs')
+const path = require('bare-path')
+const {
+  binding,
+  TranscriptionParakeet,
+  setupJsLogger,
+  getTestPaths,
+  loadGgufOrSkip
+} = require('./helpers.js')
+
+const { samplesDir } = getTestPaths()
+
+const SAMPLE_RATE = 16000
+const STREAM_CHUNK_MS = 2000
+const FEED_CHUNK_MS = 500
+
+function loadAudioSample () {
+  const samplePath = path.join(samplesDir, 'sample.raw')
+  if (!fs.existsSync(samplePath)) return null
+  const rawBuffer = fs.readFileSync(samplePath)
+  const pcm = new Int16Array(
+    rawBuffer.buffer, rawBuffer.byteOffset, rawBuffer.length / 2)
+  const audio = new Float32Array(pcm.length)
+  for (let i = 0; i < pcm.length; i++) audio[i] = pcm[i] / 32768.0
+  return audio
+}
+
+function pushableStream () {
+  const queue = []
+  let waiter = null
+  let ended = false
+  return {
+    push (chunk) {
+      if (ended) return
+      queue.push(chunk)
+      if (waiter) { const w = waiter; waiter = null; w() }
+    },
+    end () {
+      ended = true
+      if (waiter) { const w = waiter; waiter = null; w() }
+    },
+    async * [Symbol.asyncIterator] () {
+      while (true) {
+        if (queue.length > 0) { yield queue.shift(); continue }
+        if (ended) return
+        await new Promise(resolve => { waiter = resolve })
+      }
+    }
+  }
+}
+
+async function feedAndCollect (model, audio) {
+  const samplesPerChunk = Math.floor((FEED_CHUNK_MS / 1000) * SAMPLE_RATE)
+  const stream = pushableStream()
+  const segments = []
+
+  const response = await model.runStreaming(stream)
+  const updateDone = response
+    .onUpdate(out => {
+      const items = Array.isArray(out) ? out : [out]
+      for (const seg of items) {
+        if (!seg || !seg.text) continue
+        segments.push(seg)
+      }
+    })
+    .await()
+
+  for (let i = 0; i < audio.length; i += samplesPerChunk) {
+    const endIdx = Math.min(i + samplesPerChunk, audio.length)
+    const chunk = new Float32Array(audio.slice(i, endIdx))
+    stream.push(chunk)
+    if (i + samplesPerChunk < audio.length) {
+      await new Promise(resolve => setTimeout(resolve, FEED_CHUNK_MS))
+    }
+  }
+  stream.end()
+  await updateDone
+
+  return segments
+}
+
+// Pull "Speaker N" out of the addon's emitted text. Returns -1 when
+// the text doesn't match (e.g. silence sentinels). Mirrors the parser
+// used by examples/live-mic-diarized.js so the assertion below stays
+// in sync with the actual contract consumers rely on.
+function parseSpeakerId (text) {
+  const m = typeof text === 'string' ? text.match(/Speaker\s+(\d+)/) : null
+  return m ? parseInt(m[1], 10) : -1
+}
+
+test('Sortformer v2.1 AOSC — default config streams diarization segments',
+  { timeout: 600000 }, async (t) => {
+    const loggerBinding = setupJsLogger(binding)
+
+    try {
+      const modelPath = await loadGgufOrSkip(t, 'sortformerStreaming')
+      if (!modelPath) return
+
+      const audio = loadAudioSample()
+      if (!audio) { t.pass('sample.raw not found - skipping'); return }
+
+      const model = new TranscriptionParakeet({
+        files: { model: modelPath },
+        config: {
+          parakeetConfig: {
+            streaming: true,
+            streamingChunkMs: STREAM_CHUNK_MS,
+            // streamingSpkCacheEnable defaults to true; left unset so
+            // the AOSC default path runs as it would for real users.
+            maxThreads: 4,
+            useGPU: false
+          }
+        }
+      })
+
+      try {
+        await model.load()
+        const segments = await feedAndCollect(model, audio)
+
+        t.ok(segments.length > 0,
+          `AOSC streaming should emit at least one segment (got ${segments.length})`)
+
+        const speakerIds = segments
+          .map(s => parseSpeakerId(s.text))
+          .filter(id => id >= 0)
+        t.ok(speakerIds.length > 0,
+          'segments should match the "Speaker N: ..." format')
+
+        const distinctIds = new Set(speakerIds)
+        console.log(
+          `[aosc/default] segments=${segments.length} ` +
+          `speakers=${distinctIds.size} ids=[${[...distinctIds].sort().join(',')}]`)
+      } finally {
+        try { await model.unload() } catch (e) { /* ignore */ }
+      }
+    } finally {
+      try { loggerBinding.releaseLogger() } catch (e) { /* ignore */ }
+    }
+  })
+
+test('Sortformer v2.1 AOSC — streamingSpkCacheEnable=false falls back to v1 path',
+  { timeout: 600000 }, async (t) => {
+    const loggerBinding = setupJsLogger(binding)
+
+    try {
+      const modelPath = await loadGgufOrSkip(t, 'sortformerStreaming')
+      if (!modelPath) return
+
+      const audio = loadAudioSample()
+      if (!audio) { t.pass('sample.raw not found - skipping'); return }
+
+      const model = new TranscriptionParakeet({
+        files: { model: modelPath },
+        config: {
+          parakeetConfig: {
+            streaming: true,
+            streamingChunkMs: STREAM_CHUNK_MS,
+            // Force the v1 sliding-window code path on the v2.1 GGUF.
+            // The engine must accept this without errors and continue
+            // to emit speaker segments; speaker IDs may drift in ways
+            // they would not with AOSC active.
+            streamingSpkCacheEnable: false,
+            maxThreads: 4,
+            useGPU: false
+          }
+        }
+      })
+
+      try {
+        await model.load()
+        const segments = await feedAndCollect(model, audio)
+
+        t.ok(segments.length > 0,
+          'v1-path streaming should still emit at least one segment ' +
+          `(got ${segments.length})`)
+
+        const speakerIds = segments
+          .map(s => parseSpeakerId(s.text))
+          .filter(id => id >= 0)
+        t.ok(speakerIds.length > 0,
+          'segments should match the "Speaker N: ..." format')
+
+        console.log(`[aosc/disabled] segments=${segments.length}`)
+      } finally {
+        try { await model.unload() } catch (e) { /* ignore */ }
+      }
+    } finally {
+      try { loggerBinding.releaseLogger() } catch (e) { /* ignore */ }
+    }
+  })