tetherto · simon-iribarren · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
@@ -88,28 +88,28 @@ When a new cache key is used for the first time:
 
 | File | Purpose |
 |------|---------|
-| `server/bare/plugins/llamacpp-completion/ops/kv-cache-session.ts` | **`KvCacheSession` — single owner of the three KV-cache bookkeeping layers** (on-disk `.bin`, `initializedCaches`, `cachedMessageCounts`). Exposes `beginTurn` / `commitTurn` / `rollback` / `dropStaleSavedCount` plus the module-level `deleteKvCacheState(...)` administrative API. M2 (QVAC-18182). |
+| `server/bare/plugins/llamacpp-completion/ops/kv-cache-session.ts` | **`KvCacheSession` — single owner of the three KV-cache bookkeeping layers** (on-disk `.bin`, `initializedCaches`, `cachedMessageCounts`). Exposes `beginTurn` / `commitTurn` / `rollback` / `dropStaleSavedCount` plus the module-level `deleteKvCacheState(...)` administrative API. (QVAC-18182). |
 | `server/bare/plugins/llamacpp-completion/ops/completion-stream.ts` | Completion handler. Calls `session.beginTurn(...)`, registers `scope.defer(() => session.rollback(turn))` once, and calls `session.commitTurn(...)` on the happy path (which suppresses the deferred rollback). No direct references to the three layers. |
 | `server/bare/plugins/llamacpp-completion/ops/kv-cache-state.ts` | Pure `decideCachedHistorySlice(...)` helper used by the session — slice decision for the next addon call. No state. |
 | `server/bare/ops/kv-cache-utils.ts` | Path / hash / fs utilities: `getCacheFilePath`, `generateConfigHash`, `findMatchingCache`, `getCurrentCacheInfo`, `renameCacheFile`, `deleteCache`. No in-memory state. |
 | `server/bare/plugins/llamacpp-completion/ops/cache-logger.ts` | Debug logging for cache operations |
-| `server/rpc/handlers/delete-cache.ts` | `handleDeleteCache` RPC entry point. Delegates to `deleteKvCacheState(...)` — zero direct references to the three layers (M2 deliverable 5). |
+| `server/rpc/handlers/delete-cache.ts` | `handleDeleteCache` RPC entry point. Delegates to `deleteKvCacheState(...)` — zero direct references to the three layers. |
 | `server/utils/cache.ts` | `getKVCacheDir()` base directory |
 | `client/api/delete-cache.ts` | Client-side delete cache API |
 
 ## Key Behaviors
 
-### `KvCacheSession` Ownership (M2)
+### `KvCacheSession` Ownership
 
-Before M2 the completion handler coordinated three independent bookkeeping layers around every cancel/error branch:
+Before 0.11.0 the completion handler coordinated three independent bookkeeping layers around every cancel/error branch:
 
 1. An in-memory `Set<string>` of "initialized caches" (`kv-cache-utils.ts`).
 2. A `Map<string, number>` of saved-message counts (`kv-cache-state.ts`).
 3. The on-disk `.bin` files written by the addon.
 
 Three near-identical cleanup blocks in `completion-stream.ts` had to touch all three on every cancel / zero-token / rename-failed / tool-call exit. Any one of those blocks forgetting a layer produced the drift bugs the pitch documents (QVAC-17780 family).
 
-**M2 collapses this into `KvCacheSession`**, the **single mutation point** for the three layers. The handler's loop is now:
+**0.11.0 collapses this into `KvCacheSession`**, the **single mutation point** for the three layers. The handler's loop is now:
 
 ```typescript
 const session = createKvCacheSession(modelId);

@@ -74,7 +74,7 @@ Concretely, Vercel's AI SDK (`streamText`) is a public-codebase example of the s
 
 `request-lifecycle-primitives.mdc` has the worked code examples and the dispatch-level truth table for which `RequestKind`s currently route through the registry.
 
-For kinds that haven't been migrated onto the registry yet, the broad-cancel path (`cancel({ operation: <kind>, modelId })`) falls back to `addon.cancel()` directly — see the fallback in `server/bare/ops/cancel.ts`. The wire contract for non-migrated kinds is unchanged: callers continue to use `cancel({ operation: <kind>, modelId })` exactly as before.
+Every server-side cancellable handler is on the registry as of 0.11.0. The broad-cancel path (`cancel({ modelId, kind? })` and its legacy `{ operation: "inference"|"embeddings", modelId }` aliases) is a single registry walk; the legacy pre-registry addon-cancel fallback in `server/bare/ops/cancel.ts` was removed in 0.11.0. Handlers whose addon declares `cancel: { scope: "none" }` (TTS, OCR, NMT, upscale) still respect a broad cancel at the registry layer — the in-flight call yields when `ctx.signal.aborted` flips on its next yield point — they just don't get a hard mid-decode abort.
 
 ## FAQ
 
@@ -192,8 +192,8 @@ Test coverage: `same-tick cancel-before-begin retroactively aborts the later beg
 | `server/bare/runtime/with-request-context.ts`                              | `withRequestContext(logger, ctx)` — per-request logger wrapper prefixing every emit with the lifecycle correlation tuple |
 | `server/bare/runtime/request-id.ts`                                        | UUID generation helper for caller-provided ids                                                           |
 | `server/bare/runtime/index.ts`                                             | Public re-exports — handlers import from `@/server/bare/runtime`                                         |
-| `server/bare/ops/cancel.ts`                                                | Broad-cancel op: registry-routed with addon fallback for non-migrated handlers                           |
-| `server/rpc/handlers/cancelHandler.ts`                                     | RPC entry point: dispatches by `operation` (inference / embeddings / request / downloadAsset / rag)      |
+| `server/bare/ops/cancel.ts`                                                | Broad-cancel op: pure registry walk, legacy addon-cancel fallback removed in 0.11.0                      |
+| `server/rpc/handlers/cancelHandler.ts`                                     | RPC entry point: 2-arm `request` / `broad` dispatch (5-arm union collapsed in 0.11.0). Targeted `request` goes through `RequestRegistry.cancel({ requestId })` plus an optional `markClearCacheForRequest(...)` for downloads; `broad` delegates to `server/bare/ops/cancel.ts` |
 | `server/rpc/handlers/delete-cache.ts`                                      | Delegates to `deleteKvCacheState(...)` — zero direct references to the three KV-cache layers             |
 | `server/bare/plugins/llamacpp-completion/plugin.ts`                        | Reference plugin manifest; declares `cancel: { scope: "model", hard: true }`; builds `withRequestContext(...)` once per request and threads it into `completion(...)`; `finetune` declares `{ scope: "model", hard: true }`; `translate` handler threads `requestId` into the shared bare op |
 | `server/bare/plugins/llamacpp-completion/ops/completion-stream.ts`         | Reference implementation of the canonical handler shape; uses `KvCacheSession`; accepts a request-scoped `logger` |

@@ -177,10 +177,24 @@ Located in `@/utils/errors-server`
 - `AttachmentNotFoundError` - Attachment not found
 - `CancelFailedError` - Cancel failed
 - `TextToSpeechFailedError` - TTS failed
-- `RequestIdConflictError` (52417) - `registry.begin(...)` called with a `requestId` already present
-- `RequestNotFoundError` (52418) - registry lookup miss (no in-flight request for the given id)
-- `InferenceCancelledError` (52419) - cancelled inference run; carries `requestId` + `partial: { text?, toolCalls?, stats? }`. Constructed client-side on `stopReason: "cancelled"` (event stream ends normally; promise-aggregates reject with this). Re-exported from `@qvac/sdk` for `instanceof` checks.
-- `RequestRejectedByPolicyError` (52420) - registry concurrency-policy admission failure (e.g. `oneAtATimePerModel`); carries `requestId`, `kind`, `modelId`, and a `reason` string. Re-exported from `@qvac/sdk` for `instanceof` checks. See `.cursor/rules/sdk/request-lifecycle-primitives.mdc` for the policy contract.
+- `RequestIdConflictError` (52417) - `registry.begin(...)` called with a `requestId` already present. Carries `requestId`. Re-exported from `@qvac/sdk` for `instanceof` checks (reconstructed across RPC by the typed-error reconstructor — see "Typed errors across RPC" below).
+- `RequestNotFoundError` (52418) - registry lookup miss (no in-flight request for the given id). Carries `requestId`. Re-exported from `@qvac/sdk` for `instanceof` checks (reconstructed across RPC).
+- `InferenceCancelledError` (52419) - cancelled inference run; carries `requestId` + `partial: { text?, toolCalls?, stats? }`. Constructed client-side on `stopReason: "cancelled"` (event stream ends normally; promise-aggregates reject with this). Re-exported from `@qvac/sdk` for `instanceof` checks. **Not** RPC-reconstructed — client-side only.
+- `RequestRejectedByPolicyError` (52420) - registry concurrency-policy admission failure (e.g. `oneAtATimePerModel`); carries `requestId`, `kind`, `modelId`, and a `reason` string. Re-exported from `@qvac/sdk` for `instanceof` checks (reconstructed across RPC). See `.cursor/rules/sdk/request-lifecycle-primitives.mdc` for the policy contract.
+
+### Typed errors across RPC
+
+Server-thrown `QvacError` subclasses that need to survive the RPC boundary as their original class (so `err instanceof RequestRejectedByPolicyError` works on the consumer side) are wired through a small reconstructor pipeline:
+
+1. The class extends `QvacErrorBase` and implements `toErrorResponseFields(): Record<string, unknown>` listing the named constructor fields the client needs to rebuild it. The base envelope (`name`, `code`, `message`, `stack`, `timestamp`, `cause`) is already carried by `createErrorResponse(...)`; `typedFields` is the per-class extension.
+2. The class is re-exported from `@qvac/sdk` (root `index.ts`). Forgetting this means consumers can't `import { Foo } from "@qvac/sdk"` even though the reconstructor builds a `Foo` instance, and `instanceof` regresses.
+3. A row is added to the `RECONSTRUCTORS` map in `client/rpc/rpc-error.ts`, keyed by the class `name`. The row reads from `response.typedFields` (defaulting missing fields defensively) and forwards `response.cause`.
+
+`client/rpc/rpc-client.ts` calls `reconstructError(response)` instead of `new RPCError(response)`: a registered class is rebuilt; an unknown `name` falls through to `RPCError` so consumers using `code`-based predicates still work.
+
+Three classes are wired today: `RequestIdConflictError`, `RequestNotFoundError`, `RequestRejectedByPolicyError`. Add new rows whenever a new cross-RPC server-thrown class is introduced — the maintenance contract lives at the top of the `RECONSTRUCTORS` map.
+
+`InferenceCancelledError` is **not** in this map: it's constructed client-side in `client/api/completion-stream.ts` from the aggregated partial state on `stopReason: "cancelled"`. Adding a reconstructor for a client-constructed class creates a parallel construction path and is a smell.
 
 #### RAG Operations (52,800-52,999)
 - `RAGSaveFailedError` - Save failed

@@ -16,7 +16,7 @@ Server-side long-running operations (`completion`, `embeddings`, `transcribe`, `
 - **`RequestContext`** — per-request handle bundling `requestId`, `kind`, `modelId`, `signal`, `scope`, `state`.
 - **`RequestRegistry`** — module-scoped registry that mints contexts via `begin(...)` and routes `cancel(...)` by `requestId` or `modelId`.
 
-The contract below applies to every cancellable server-side handler. The truth table further down ("Truth table for built-in plugins") tracks each handler's addon-level cancel surface; the dispatch-level table ("What's on the registry today") tracks which `RequestKind`s are currently routed through the registry. Kinds not on the registry use the broad-cancel fallback in `server/bare/ops/cancel.ts`.
+The contract below applies to every cancellable server-side handler. The truth table further down ("Truth table for built-in plugins") tracks each handler's addon-level cancel surface; the dispatch-level table ("What's on the registry today") tracks which `RequestKind`s are currently routed through the registry. As of 0.11.0 every handler in the SDK is registered — the legacy pre-registry addon-cancel fallback in `server/bare/ops/cancel.ts` has been removed.
 
 ## Canonical Handler Shape
 
@@ -244,7 +244,7 @@ The truth table above describes the addon-level capability for plugin handlers.
 | `downloadAsset`   | `server/rpc/handlers/download-asset.ts`                          | Hard (signal threaded to `resolveModelPath`)                  | Per-`requestId` cancel preserves the content-addressed dedup in `download-manager.ts` — two subscribers on the same `downloadKey` share one transfer, and the transfer aborts only when the **last** subscriber leaves. |
 | `rag`             | `server/rpc/handlers/rag.ts`                                     | Soft (workspace-bound; ingest/saveEmbeddings/reindex)         | Dispatcher-level pre-emption: starting a new RAG op on a workspace cancels the prior in-flight op on the same workspace **before** `registry.begin(...)`. Workspace admission lives in the dispatcher rather than as a registry policy primitive. |
 
-Kinds **not** in this table (e.g. `textToSpeech`, `ocr`, `diffusion`, `upscale`) still use the broad-cancel fallback in `server/bare/ops/cancel.ts`.
+Kinds **not** in this table (e.g. `textToSpeech`, `ocr`, `diffusion`, `upscale`) declare `cancel: { scope: "none" }` at the addon level — they do not expose a hard mid-decode abort surface but still respect `cancel({ requestId })` and `cancel({ modelId })` at the registry layer (the in-flight call yields when `ctx.signal.aborted` flips on the next yield point). The broad-cancel path is a single registry walk; the per-kind fallback was removed in 0.11.0.
 
 
 ## Concurrency Policy
@@ -361,11 +361,17 @@ await sdk.cancel({ requestId: op.requestId });
 Cancel every in-flight request matching a `modelId` — for model unload, app shutdown, admin sweeps. Kept stable from pre-0.11.0:
 
 ```typescript
+// Generic broad-cancel — preferred shape going forward (0.11.0+).
+await sdk.cancel({ modelId });
+await sdk.cancel({ modelId, kind: "completion" });
+await sdk.cancel({ modelId, kind: "embeddings" });
+
+// Legacy per-kind sugars — still supported via the client wrapper.
 await sdk.cancel({ operation: "inference", modelId });
 await sdk.cancel({ operation: "embeddings", modelId });
 ```
 
-Internally, both paths land on `RequestRegistry.cancel(...)`. The broad path falls back to `addon.cancel()` for handler kinds that haven't been registry-migrated yet — see the "What's on the registry today" table above for the current set of registry-routed kinds.
+Internally every path lands on `RequestRegistry.cancel(...)`. The legacy pre-registry addon-cancel fallback in `server/bare/ops/cancel.ts` was removed in 0.11.0 — every handler is now on the registry, so a broad cancel is one registry walk and nothing else. See the "What's on the registry today" table above; it lists "all kinds" because the migration is complete in 0.11.0.
 
 ## Decorated-Promise Pattern
 

@@ -3,6 +3,7 @@ import { readBody, sendJson, sendError, initSSE, sendSSE, endSSE } from '../../.
 import { resolveModelAlias } from '../../../config.js'
 import { sdkCompletion } from '../../../core/sdk.js'
 import type { SDKTool, SDKGenerationParams, SDKResponseFormat } from '../../../core/sdk.js'
+import { bindClientDisconnectCancel } from '../../../core/cancel-bridge.js'
 import {
   openaiMessagesToHistory,
   openaiToolsToSdk,
@@ -96,9 +97,9 @@ export async function handleChatCompletions (req: IncomingMessage, res: ServerRe
 
   try {
     if (streaming) {
-      await handleStreamingCompletion(res, { sdkModelId, history, tools, generationParams, responseFormat, modelAlias, logger: ctx.logger })
+      await handleStreamingCompletion(req, res, { sdkModelId, history, tools, generationParams, responseFormat, modelAlias, logger: ctx.logger })
     } else {
-      await handleBlockingCompletion(res, { sdkModelId, history, tools, generationParams, responseFormat, modelAlias, logger: ctx.logger })
+      await handleBlockingCompletion(req, res, { sdkModelId, history, tools, generationParams, responseFormat, modelAlias, logger: ctx.logger })
     }
   } catch (err) {
     const message = err instanceof Error ? err.message : String(err)
@@ -124,7 +125,7 @@ function completionTokensFromStats (text: string, stats: { generatedTokens?: num
   return text ? text.split(/\s+/).filter(Boolean).length : 0
 }
 
-async function handleBlockingCompletion (res: ServerResponse, params: CompletionParams): Promise<void> {
+async function handleBlockingCompletion (req: IncomingMessage, res: ServerResponse, params: CompletionParams): Promise<void> {
   const result = await sdkCompletion({
     modelId: params.sdkModelId,
     history: params.history,
@@ -134,6 +135,12 @@ async function handleBlockingCompletion (res: ServerResponse, params: Completion
     responseFormat: params.responseFormat
   })
 
+  // Bridge HTTP client disconnect → SDK cancel. Bound after the
+  // wrapper await but before any `await` on the result aggregates,
+  // so a fetch-abort mid-completion lands on the in-flight requestId
+  // before tokens have fully resolved.
+  bindClientDisconnectCancel(req, res, result.requestId, params.logger)
+
   const text = await result.text
   const toolCalls = await result.toolCalls
   const stats = await result.stats
@@ -171,7 +178,7 @@ async function handleBlockingCompletion (res: ServerResponse, params: Completion
   })
 }
 
-async function handleStreamingCompletion (res: ServerResponse, params: CompletionParams): Promise<void> {
+async function handleStreamingCompletion (req: IncomingMessage, res: ServerResponse, params: CompletionParams): Promise<void> {
   const result = await sdkCompletion({
     modelId: params.sdkModelId,
     history: params.history,
@@ -181,6 +188,13 @@ async function handleStreamingCompletion (res: ServerResponse, params: Completio
     responseFormat: params.responseFormat
   })
 
+  // Bridge HTTP client disconnect → SDK cancel. The synchronous
+  // `result.requestId` (decorated on the `CompletionRun`) is what makes
+  // this work: we can bind the listener before the first SSE frame
+  // streams, so a fetch-abort during inference aborts the in-flight
+  // SDK request rather than letting it run to natural completion.
+  bindClientDisconnectCancel(req, res, result.requestId, params.logger)
+
   initSSE(res)
 
   const id = `chatcmpl-${randomId()}`

@@ -2,6 +2,7 @@ import type { IncomingMessage, ServerResponse } from 'node:http'
 import { readBody, sendJson, sendError } from '../../../http.js'
 import { resolveModelAlias } from '../../../config.js'
 import { sdkEmbed } from '../../../core/sdk.js'
+import { bindClientDisconnectCancel } from '../../../core/cancel-bridge.js'
 import type { RouteContext } from '../../types.js'
 
 export async function handleEmbeddings (req: IncomingMessage, res: ServerResponse, ctx: RouteContext): Promise<void> {
@@ -60,11 +61,18 @@ export async function handleEmbeddings (req: IncomingMessage, res: ServerRespons
   ctx.logger.info(`  embed model=${modelAlias} inputs=${inputs.length}`)
 
   try {
-    const embeddings = await sdkEmbed({
+    const op = await sdkEmbed({
       modelId: sdkModelId,
       text: inputs.length === 1 ? inputs[0]! : inputs
     })
 
+    // Bind the disconnect bridge before awaiting the result so a
+    // client-abort during a long batch embed lands on the in-flight
+    // requestId rather than completing the whole batch.
+    bindClientDisconnectCancel(req, res, op.requestId, ctx.logger)
+
+    const embeddings = await op.result
+
     const isBatch = Array.isArray(embeddings[0])
     const vectors = isBatch ? embeddings as number[][] : [embeddings as number[]]