tetherto · ishanvohra2 · May 23, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
@@ -15,6 +15,12 @@ packages/sdk/bun.lock
 NOTICE_LOG.txt
 NOTICE_FULL_REPORT.txt
 
+# Local TTS / transcription example output artifacts (generated by
+# packages/sdk/examples/**). Never commit these — they're large WAVs
+# produced by running the demos locally.
+packages/sdk/*-output.wav
+packages/sdk/examples/**/*-output.wav
+
 # Slack/Discord copy-paste announcement posts generated by the
 # changelog skill (see scripts/sdk/generate-changelog-sdk-pod.cjs
 # --generate-announcement-post). These are local working artifacts,

@@ -6,11 +6,11 @@ schemaType: HowTo
 
 ## Overview
 
-Transcription uses your choice of either [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) (via [ONNX Runtime](https://onnxruntime.ai)) as inference engine. Load a model using `modelType: "whisper"` for `qvac-ext-lib-whisper.cpp`, or `modelType: "parakeet"` for Parakeet. Parakeet supports multilingual transcription (TDT), english-only transcription (CTC), and speaker diarization (Sortformer).
+Transcription uses your choice of either [`qvac-ext-lib-whisper.cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp) or [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) (via the GGML-based [`parakeet-cpp`](https://github.com/tetherto/qvac-ext-lib-whisper.cpp/tree/main/parakeet-cpp) engine) as inference engine. Load a model using `modelType: "whisper"` for `qvac-ext-lib-whisper.cpp`, or `modelType: "parakeet"` for Parakeet. Parakeet supports multilingual transcription (TDT), english-only transcription (CTC), speaker diarization (Sortformer), and end-of-utterance detection (EOU) for duplex streaming.
 
 Provide audio input as `audioChunk`, either as a file path (string) or an in-memory audio buffer.
 
-`transcribe()` returns the full transcription as a single `string`. If you need partial results as they become available, use `transcribeStream()` to receive text chunks in real-time.
+`transcribe()` returns the full transcription as a single `string`. If you need partial results as they become available, use `transcribeStream()` to receive text chunks in real-time. Both whisper and parakeet expose duplex `transcribeStream()` sessions; see "Streaming with `transcribeStream()`" below.
 
 ## Functions
 
@@ -31,23 +31,88 @@ You should load two models:
 
 ### Parakeet
 
-Parakeet requires multiple model artifacts. The required files depend on the model variant:
-- **TDT** (multilingual, ~25 languages): encoder, encoder data, decoder, vocabulary, and preprocessor files.
-- **CTC** (english-only): model, model data, and tokenizer files.
-- **Sortformer** (speaker diarization): a single model file.
+As of `@qvac/transcription-parakeet` 0.6.0, Parakeet ships as a **single GGUF** per variant — the addon auto-detects TDT / CTC / Sortformer / EOU from `parakeet.model.type` GGUF metadata. There is no `modelConfig.modelType` discriminator, no per-variant `parakeet*Src` artifact fields, and no `ParakeetArtifactsRequiredError`. Just supply the GGUF via the top-level `modelSrc`:
 
-Pass the model variant via `modelConfig.modelType` (`"tdt"`, `"ctc"`, or `"sortformer"`) and provide the corresponding source fields in `modelConfig`. See [`loadModel()` — Parakeet `modelConfig`](/reference/api#loadmodel).
+```ts
+await loadModel({
+  modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,    // multilingual, ~750MB
+  modelType: "parakeet",
+});
+
+await loadModel({
+  modelSrc: PARAKEET_CTC_0_6B_Q8_0,       // english-only, streaming-capable
+  modelType: "parakeet",
+});
+
+await loadModel({
+  modelSrc: PARAKEET_SORTFORMER_4SPK_V1_Q8_0,  // 4-speaker diarization
+  modelType: "parakeet",
+});
+
+await loadModel({
+  modelSrc: PARAKEET_EOU_120M_V1_Q8_0,    // end-of-utterance detection
+  modelType: "parakeet",
+});
+```
 
 For model artifacts available as constants, see [SDK — Models](/introduction#models).
 
-<Callout type="success">
-**Tip:** if you are not using SDK model constants, download the required files from a compatible model repository — e.g., [`parakeet-ctc-0.6b-ONNX` on Hugging Face](https://huggingface.co/onnx-community/parakeet-ctc-0.6b-ONNX/tree/main/onnx). For a list of compatible repositories, see [Addons — Parakeet](https://github.com/tetherto/qvac/tree/main/packages/transcription-parakeet#models).
+<Callout type="info">
+**Migrating from pre-0.6 Parakeet (ONNX multi-file):** the legacy multi-file ONNX `modelConfig` shape (`parakeetEncoderSrc` / `parakeetDecoderSrc` / `parakeetVocabSrc` / `parakeetPreprocessorSrc`, plus `parakeetCtcModelSrc` / `parakeetTokenizerSrc` and `parakeetSortformerSrc` for the CTC/Sortformer variants) is no longer supported. Passing any of those fields raises a structured `LegacyParakeetModelDeprecatedError` with a migration message. The legacy ONNX constants (e.g. `PARAKEET_TDT_ENCODER_INT8`, `PARAKEET_CTC_FP32`, `PARAKEET_SORTFORMER_FP32`) remain exported for one minor cycle for codemod migrations only and will be removed in a future release.
 </Callout>
 
 <Callout type="info">
 **On VAD:** when using `qvac-ext-lib-whisper.cpp`, you can optionally provide a separate model for voice activity detection (VAD); this is recommended. In turn, Parakeet handles VAD internally, so no additional model or configuration is required.
 </Callout>
 
+## Streaming with `transcribeStream()`
+
+`transcribeStream()` opens a duplex session for both engines — write audio chunks via `session.write(...)`, iterate events with `for await (const event of session) { ... }`. Events are typed as a discriminated union `{ type }`:
+
+- `{ type: "text", text }` — incremental transcript text.
+- `{ type: "segment", segment }` — segment metadata (whisper-only when `metadata: true`).
+- `{ type: "vad", speaking, probability }` — voice-activity-detection state (whisper-only).
+- `{ type: "endOfTurn", source: "whisper", silenceDurationMs }` — turn boundary detected from a measured silence window (whisper).
+- `{ type: "endOfTurn", source: "parakeet" }` — turn boundary detected from the EOU model's `<EOU>` token (parakeet; no silence window — the event is token-driven).
+
+The `source` field on `endOfTurn` lets consumers narrow the union: whisper events always carry a numeric `silenceDurationMs`; parakeet events never do.
+
+<Callout type="info">
+**Wire compatibility:** post-0.6 servers emit `source` on every `endOfTurn` frame. SDK parsers still accept the legacy whisper wire shape `{ silenceDurationMs }` (no `source`) and normalize it to `source: "whisper"`. Upgrade client and server together when using parakeet `source: "parakeet"` events — older servers never emit that branch.
+</Callout>
+
+### Parakeet duplex streaming
+
+Pass `parakeetStreamingConfig` to `transcribeStream()` to override per-call streaming knobs (each falls back to its `parakeetConfig.streaming*` load-time counterpart):
+
+```ts
+const session = await transcribeStream({
+  modelId,
+  parakeetStreamingConfig: {
+    chunkMs: 1000,            // encoder cadence
+    historyMs: 30000,         // sortformer rolling-history window
+    leftContextMs: 500,       // ASR encoder left-context window
+    rightLookaheadMs: 200,    // ASR encoder right-lookahead window
+    emitPartials: true,       // emit partial segments before chunk boundaries
+    emitEnergyVad: false,     // CTC/TDT energy-based VAD hint (engine-internal)
+  },
+});
+
+for await (const event of session) {
+  switch (event.type) {
+    case "text":
+      process.stdout.write(event.text);
+      break;
+    case "endOfTurn":
+      // event.source: "whisper" | "parakeet"
+      console.log("\n[endOfTurn] turn boundary detected\n");
+      break;
+  }
+}
+```
+
+The synthetic `{ type: "endOfTurn", source: "parakeet" }` event surfaces whenever the EOU model emits an `<EOU>` token, and is the parakeet equivalent of whisper's silence-window EOU. Pair it with the `PARAKEET_EOU_120M_V1_Q8_0` checkpoint when you need explicit turn boundaries from parakeet.
+
 ## Examples
 
 ### `qvac-ext-lib-whisper.cpp`

@@ -1518,7 +1518,7 @@ and read `error.code` / `error.cause`. Code ranges:
 | `VAD_MODEL_REQUIRED` | 52205 | VAD model source is required for this configuration |
 | `TTS_ARTIFACTS_REQUIRED` | 52208 | TTS (Chatterbox) requires ttsTokenizerSrc, ttsSpeechEncoderSrc, ttsEmbedTokensSrc, ttsConditionalDecoderSrc, and ttsLanguageModelSrc |
 | `TTS_REFERENCE_AUDIO_REQUIRED` | 52209 | TTS (Chatterbox) requires referenceAudioSrc (path or URL to a WAV file for voice cloning) |
-| `PARAKEET_ARTIFACTS_REQUIRED` | 52210 | Parakeet model sources are missing. TDT requires parakeetEncoderSrc, parakeetDecoderSrc, parakeetVocabSrc, parakeetPreprocessorSrc. CTC requires parakeetCtcModelSrc, parakeetTokenizerSrc. Sortformer requires parakeetSortformerSrc. |
+| `LEGACY_PARAKEET_MODEL_DEPRECATED` | 52210 | Legacy parakeet ONNX modelConfig fields are no longer supported. As of `@qvac/transcription-parakeet` 0.6.0 the addon ships as a single GGUF that auto-detects TDT / CTC / EOU / Sortformer from GGUF metadata. Supply the GGUF via the top-level `modelSrc` (e.g. `loadModel({ modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0, modelType: "parakeet" })`). |
 | `MODEL_UNLOAD_FAILED` | 52400 | Failed to unload model… |
 | `EMBED_FAILED` | 52401 | Failed to generate embeddings… |
 | `EMBED_NO_EMBEDDINGS` | 52402 | No embeddings returned from model |

@@ -168,6 +168,14 @@ export function transcribeStream(
   params: TranscribeStreamClientParams & { emitVadEvents: true },
   options?: RPCOptions,
 ): Promise<TranscribeStreamConversationSession>;
+export function transcribeStream(
+  params: TranscribeStreamClientParams & {
+    parakeetStreamingConfig: NonNullable<
+      TranscribeStreamClientParams["parakeetStreamingConfig"]
+    >;
+  },
+  options?: RPCOptions,
+): Promise<TranscribeStreamConversationSession>;
 export function transcribeStream(
   params: TranscribeStreamClientParams & { metadata: true },
   options?: RPCOptions,
@@ -196,7 +204,10 @@ export function transcribeStream(
     return transcribeStreamWithAudio(params, options);
   }
   const streamParams = params as TranscribeStreamClientParams;
-  if (streamParams.emitVadEvents === true) {
+  if (
+    streamParams.emitVadEvents === true ||
+    streamParams.parakeetStreamingConfig !== undefined
+  ) {
     return transcribeStreamDuplexConversation(streamParams, options);
   }
   if (streamParams.metadata === true) {
@@ -257,6 +268,9 @@ function buildTranscribeStreamRequest(
     ...(params.vadRunIntervalMs !== undefined && {
       vadRunIntervalMs: params.vadRunIntervalMs,
     }),
+    ...(params.parakeetStreamingConfig && {
+      parakeetStreamingConfig: params.parakeetStreamingConfig,
+    }),
   };
 }
 
@@ -435,9 +449,16 @@ function processLineConversation(
       };
     }
     if (response.endOfTurn) {
+      if (response.endOfTurn.source === "whisper") {
+        return {
+          type: "endOfTurn",
+          source: "whisper",
+          silenceDurationMs: response.endOfTurn.silenceDurationMs,
+        };
+      }
       return {
         type: "endOfTurn",
-        silenceDurationMs: response.endOfTurn.silenceDurationMs,
+        source: "parakeet",
       };
     }
     if (wantsMetadata) {

@@ -1,36 +1,41 @@
+/**
+ * Parakeet CTC transcription from a WAV file.
+ *
+ * Usage:
+ *   bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> [parakeet-ctc-gguf]
+ *
+ * Loads a single GGUF checkpoint (`PARAKEET_CTC_0_6B_Q8_0` by default) and
+ * transcribes the file with the batch `transcribe` API. Omit the model
+ * argument to use the registry constant.
+ *
+ * Audio should be 16 kHz mono PCM in a WAV container.
+ */
 import {
   loadModel,
   unloadModel,
   transcribe,
-  PARAKEET_CTC_FP32,
-  PARAKEET_CTC_TOKENIZER,
+  PARAKEET_CTC_0_6B_Q8_0,
 } from "@qvac/sdk";
 
 const args = process.argv.slice(2);
 
 if (!args[0]) {
   console.error(
     "Usage: bun run examples/transcription/parakeet-ctc-filesystem.ts <wav-file> " +
-      "[model.onnx] [tokenizer.json]",
+      "[parakeet-ctc-gguf]",
   );
-  console.error("\nIf model paths are omitted, defaults to registry models.");
+  console.error("\nIf the model path is omitted, defaults to the registry model.");
   process.exit(1);
 }
 
 const audioFilePath = args[0];
-const parakeetCtcModelSrc = args[1] ?? PARAKEET_CTC_FP32;
-const parakeetTokenizerSrc = args[2] ?? PARAKEET_CTC_TOKENIZER;
+const parakeetModelSrc = args[1] ?? PARAKEET_CTC_0_6B_Q8_0;
 
 try {
   console.log("Loading Parakeet CTC model...");
   const modelId = await loadModel({
-    modelSrc: parakeetCtcModelSrc,
+    modelSrc: parakeetModelSrc,
     modelType: "parakeet",
-    modelConfig: {
-      modelType: "ctc",
-      parakeetCtcModelSrc,
-      parakeetTokenizerSrc,
-    },
     onProgress: (progress) => {
       console.log(`Download progress: ${progress.percentage.toFixed(1)}%`);
     },
@@ -48,6 +53,6 @@ try {
   await unloadModel({ modelId });
   console.log("Done");
 } catch (error) {
-  console.error("Error:", error);
+  console.error("❌ Error:", error);
   process.exit(1);
 }
@@ -1,21 +1,19 @@
 /**
- * Microphone → Parakeet transcription using chunked `transcribe` calls.
+ * Microphone → Parakeet batch transcription (chunked `transcribe`).
  *
- * Usage: bun run examples/transcription/parakeet-microphone-record.ts
+ * Usage:
+ *   bun run examples/transcription/parakeet-microphone-record.ts
  *
- * Captures 3-second audio chunks from the microphone and sends each to the
- * batch `transcribe` API. Press Ctrl+C to quit.
+ * Captures 3 s s16le chunks from the microphone and sends each to `transcribe`
+ * with the TDT model. Press Ctrl+C to stop.
  *
  * Requirements: FFmpeg installed, microphone access.
  */
 import {
   loadModel,
   unloadModel,
   transcribe,
-  PARAKEET_TDT_ENCODER_FP32,
-  PARAKEET_TDT_DECODER_FP32,
-  PARAKEET_TDT_VOCAB,
-  PARAKEET_TDT_PREPROCESSOR_FP32,
+  PARAKEET_TDT_0_6B_V3_Q8_0,
 } from "@qvac/sdk";
 import { spawnSync } from "child_process";
 import { startMicrophone } from "../audio/mic-input";
@@ -37,14 +35,8 @@ try {
 
 console.log("Loading Parakeet model...");
 const modelId = await loadModel({
-  modelSrc: PARAKEET_TDT_ENCODER_FP32,
+  modelSrc: PARAKEET_TDT_0_6B_V3_Q8_0,
   modelType: "parakeet",
-  modelConfig: {
-    parakeetEncoderSrc: PARAKEET_TDT_ENCODER_FP32,
-    parakeetDecoderSrc: PARAKEET_TDT_DECODER_FP32,
-    parakeetVocabSrc: PARAKEET_TDT_VOCAB,
-    parakeetPreprocessorSrc: PARAKEET_TDT_PREPROCESSOR_FP32,
-  },
   onProgress: (p) => console.log(`Download: ${p.percentage.toFixed(1)}%`),
 });
 console.log("Model loaded.\n");

@@ -0,0 +1,93 @@
+/**
+ * Microphone → Parakeet duplex streaming (`transcribeStream`).
+ *
+ * Usage:
+ *   bun run examples/transcription/parakeet-microphone-stream.ts
+ *
+ * Streams microphone audio through `transcribeStream` with
+ * `parakeetStreamingConfig`. Uses the EOU checkpoint so you may see
+ * `{ type: "endOfTurn", source: "parakeet" }` events; CTC/TDT models
+ * emit transcript text only. Parakeet does not yield standalone VAD events.
+ *
+ * Requirements: FFmpeg installed, microphone access.
+ */
+import {
+  loadModel,
+  unloadModel,
+  transcribeStream,
+  PARAKEET_EOU_120M_V1_Q8_0,
+} from "@qvac/sdk";
+import { spawnSync } from "child_process";
+import { startMicrophone } from "../audio/mic-input";
+
+const SAMPLE_RATE = 16000;
+
+try {
+  const r = spawnSync("ffmpeg", ["-version"], { stdio: "ignore" });
+  if (r.error || r.status !== 0) throw new Error("FFmpeg not found");
+} catch {
+  console.error("Error: FFmpeg is required. Install it and try again.");
+  process.exit(1);
+}
+
+let modelId: string | null = null;
+let ffmpeg: ReturnType<typeof startMicrophone> | null = null;
+
+async function cleanup() {
+  console.log("\n\nStopping...");
+  ffmpeg?.kill();
+  if (modelId) await unloadModel({ modelId });
+  console.log("Done.");
+}
+
+process.on("SIGINT", () => {
+  void cleanup().finally(() => process.exit(0));
+});
+process.on("SIGTERM", () => {
+  void cleanup().finally(() => process.exit(0));
+});
+
+try {
+  console.log("Loading Parakeet (EOU) streaming model...");
+  modelId = await loadModel({
+    modelSrc: PARAKEET_EOU_120M_V1_Q8_0,
+    modelType: "parakeet",
+    onProgress: (p) => console.log(`Download: ${p.percentage.toFixed(1)}%`),
+  });
+  console.log("Model loaded.\n");
+
+  ffmpeg = startMicrophone({ sampleRate: SAMPLE_RATE, format: "s16le" });
+
+  const session = await transcribeStream({
+    modelId,
+    parakeetStreamingConfig: {
+      chunkMs: 1000,
+      emitPartials: true,
+    },
+  });
+
+  ffmpeg.stdout.on("data", (chunk: Buffer) => session.write(chunk));
+
+  console.log(
+    "Listening... speak and pause to see transcripts. End-of-turn boundaries fire when the EOU model emits an <EOU> token.\n",
+  );
+
+  for await (const event of session) {
+    switch (event.type) {
+      case "text":
+        if (event.text.trim()) {
+          process.stdout.write(`${event.text}`);
+        }
+        break;
+      case "endOfTurn":
+        console.log("\n[endOfTurn] turn boundary detected\n");
+        break;
+    }
+  }
+  await cleanup();
+  process.exit(0);
+} catch (error) {
+  console.error("Error:", error);
+  await cleanup();
+  process.exit(1);
+}