Skip to content

QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession#2007

Merged
simon-iribarren merged 13 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-18182-typed-cancel-outcomes-kvcache-session
May 13, 2026
Merged

QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession#2007
simon-iribarren merged 13 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-18182-typed-cancel-outcomes-kvcache-session

Conversation

@simon-iribarren

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Cancelled completions had no typed outcome. With M1's signal-based cancel the request did stop, but the wire just emitted a normal-looking completionDone and the client's final / text / toolCalls / stats promises resolved with empty/partial values — indistinguishable from a real EOS. Callers couldn't tell "you cancelled this" from "the model returned nothing".
  • Three KV-cache layers drifted under cancel/error paths. cachedMessageCounts (in-memory map), initializedCaches (in-memory set), and the on-disk .bin file each had their own cleanup site. completion-stream.ts repeated the same fsPromises.unlink(...) + clearCacheRegistry(...) + cachedMessageCounts.delete(...) triplet in three places (cancelled-custom-key turn, cancelled-auto-cache turn, auto-cache rename failure). A miss in any one of them left observable cross-turn corruption.
  • The same-tick stop-button race lost cancels. M1's RequestRegistry only matched cancel({ requestId }) against in-flight requests, so a cancel that landed before the worker called registry.begin(...) silently disappeared. The eventual begin ran without an aborted signal and the cancelled request streamed to completion.

📝 How does it solve it?

  • Typed cancel outcomes (D1–D3). stopReason: "cancelled" joins "eos" | "length" | "stopSequence" on the completionDone wire event so the events stream still terminates cleanly. The client-side aggregator in client/api/completion-stream.ts detects the cancelled terminator and rejects the promise-aggregates (final, text, toolCalls, stats) with a new InferenceCancelledError(requestId, partial). partial.text / partial.toolCalls / partial.stats carry whatever the aggregator accumulated up to the cancel point — callers instanceof-check and reach for whichever partial field they need. The error class is defined server-side (utils/errors-server.ts, code 52419), constructed client-side (so the partial payload survives the RPC envelope), and re-exported from @qvac/sdk for instanceof checks.
  • KvCacheSession as the atomic three-layer owner (D4). New server/bare/plugins/llamacpp-completion/ops/kv-cache-session.ts owns cachedMessageCounts, initializedCaches, and .bin file lifecycle behind a beginTurn / commitTurn / rollback API. completion-stream.ts registers scope.defer(() => session.rollback(turn)) at the top of a turn and calls session.commitTurn(...) only on the success path — so cancel / addon failure / auto-cache rename failure / scope unwind all funnel through the same rollback. The three duplicated cleanup blocks in the pre-PR completion-stream.ts are gone; the file dropped from ~756 lines to ~661 with the cancellation logic centralised.
  • Cross-model administrative delete (D5). A new module-level deleteKvCacheState({ kvCacheKey?, modelId?, all? }) in kv-cache-session.ts handles wipe-by-key (optionally scoped to a model) and wipe-all, touching all three layers atomically. The RPC handler server/rpc/handlers/delete-cache.ts now delegates straight to it — no more direct fsPromises / module-set juggling in the handler.
  • Bounded cancelledBeforeBegin map closes the race (D6, Option A). RequestRegistry keeps a 128-entry / 30 s TTL map of requestId → { at, reason }. cancel(...) records into the map when no in-flight context is found; begin(...) consults it and aborts the freshly-minted controller before returning the ManagedRequestContext. The wire surface for cancel is unchanged — internal-only fix, no [bc]. Eviction is size-cap + TTL combined.
  • server/bare/runtime/request-registry.ts begin() now derives terminal state ("cancelled" | "completed") from ctx.signal.aborted so a same-tick cancel-then-begin pair lands as state: "cancelled", observable via __requestRegistryTestHooks in unit tests.
  • Cursor rules updated in this PR (the brief's requirement): .cursor/rules/sdk/request-lifecycle-primitives.mdc, .cursor/rules/sdk/docs/request-lifecycle-system.mdc, .cursor/rules/sdk/docs/kv-cache-system.mdc, .cursor/rules/sdk/error-handling.mdc — error codes table, cancelled completion semantics, KvCacheSession ownership, FAQ entry for the race close, anti-patterns.

🧪 How was it tested?

Functional-equivalence vs. pre-PR completion-stream.ts (the D5/D7 mitigation listed in the brief):

Scenario Pre-PR cleanup site M2 coverage
Custom kvCache key, cancelled mid-stream completion-stream.ts lines ~595–607 (unlink + clearCacheRegistry + cachedMessageCounts.delete) KvCacheSession.rollback(turn) via scope.defer. Verified by cancel-then-resume-kv-cache e2e + kv-cache-session.test.ts "rollback wipes all three layers atomically".
Custom kvCache key, zero-token turn Same block, branch on shouldRecordSavedCount Same rollback path — commitTurn is never called, so the deferred rollback runs unconditionally.
Auto kvCache (kvCache: true), cancelled mid-stream lines ~688–704 (unlink + cachedMessageCounts.delete) Same KvCacheSession.rollback(turn) via scope.defer.
Auto kvCache, zero-token EOS turn (no content to promote) Same block, alt branch Same rollback.
Auto kvCache, file-rename failure during commitTurn lines ~725–740 (unlink + cachedMessageCounts.delete) commitTurn returns false / throws → deferred rollback wipes everything (kv-cache-session.test.ts "commitTurn rolls back if the addon did not persist the file").
Successful commit trailing cachedMessageCounts.delete(cachePathToUse) + recordCacheSaveCount(...) commitTurn(turn, newPath, savedCount) records the new path/count and drops the rollback hook in one transaction.

Unit tests (packages/sdk/test/unit/):

  • runtime/request-registry.test.ts — same-tick cancel-before-begin now retroactively aborts (was: 0 matched, now: signal aborted on the new context); UUID-retry path tested; bounded-set eviction tested (200 cancel-before-begin calls don't grow the internal map past 128 entries; eviction preserves the freshest entries).
  • runtime/kv-cache-session.test.tsbeginTurn priming + reuse, commitTurn records new count + suppresses rollback, rollback wipes all three layers, rollback tolerates missing on-disk file, double-rollback is idempotent, dropStaleSavedCount forgets count without touching file/init flag, deleteKvCacheState({ kvCacheKey }) + deleteKvCacheState({ all }) wipe correctly, commitTurn triggers rollback when the addon doesn't persist. Tests are bareTest-gated (dynamic import, runs in the Bare consumer; skipped under Bun because kv-cache-session.ts transitively imports bare-fs / bare-path).
  • completion-event-schemas.test.tssuccessDoneSchema.parse({ type: "completionDone", seq, stopReason: "cancelled" }) accepted; cancelled cases added to the event-type routing test.
  • kv-cache-cancel.test.ts — refactored away from the deprecated module-scoped cachedMessageCounts map; decideCachedHistorySlice is now exercised purely with the explicit savedCount parameter (matching the M2 API).

E2E tests (packages/sdk/tests-qvac/tests/cancellation-tests.ts + shared/executors/cancellation-executor.ts):

  • cancel-mid-stream-completion — fire cancel({ requestId }) after the third contentDelta, assert (a) events end with completionDone.stopReason === "cancelled", (b) run.final rejects with InferenceCancelledError, (c) error.partial.text equals the concatenated contentDelta.text the test observed on run.events.
  • cancel-before-begin-completion — fire cancel({ requestId }) synchronously after completion(...) returns (before the completionStream RPC settles), assert the eventual outcome is a typed cancelled regardless of whether the cancel landed before or after begin(...) ran on the worker.
  • cancel-then-resume-kv-cache — same as the existing kv-cache-cancel-then-new-prompt but additionally asserts the first turn's InferenceCancelledError rejection and the stopReason: "cancelled" event, then verifies the second turn on the same kvCache key re-primes cleanly and produces the expected answer (proving KvCacheSession.rollback wiped the three layers).

Pre-merge gates (run from packages/sdk/):

bun lint        # eslint + tsc, clean
bun run build   # clean
bun run test:unit  # 10/10 files, all asserts pass

🔌 API Changes

stopReason: "cancelled" is a new value on the successDoneSchema enum; InferenceCancelledError is a new export. Both are additive — existing consumers ignoring the new enum value or not catching the new error class still work.

import { completion, cancel, InferenceCancelledError } from "@qvac/sdk";

const run = completion({ modelId, history, stream: true });

// Iterating events stays unchanged — events end naturally with the
// cancelled terminator on cancel, no thrown error:
for await (const event of run.events) {
  if (event.type === "completionDone" && event.stopReason === "cancelled") {
    // request was cancelled — handle here if you want
  }
}

// Awaiting promise-aggregates throws InferenceCancelledError on cancel:
try {
  await cancel({ requestId: run.requestId });
  const text = await run.text;
  // ...
} catch (err) {
  if (err instanceof InferenceCancelledError) {
    console.log(`cancelled: requestId=${err.requestId}, partial=${err.partial.text}`);
    // err.partial.toolCalls and err.partial.stats are also available
  }
}

The internal RequestRegistry cancel-before-begin race close is wire-invisible — cancel({ requestId }) and cancel({ operation: "inference", modelId }) behave identically on the wire as before. Same-tick stop clicks just stop being silently dropped.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@simon-iribarren simon-iribarren force-pushed the feat/qvac-18182-typed-cancel-outcomes-kvcache-session branch from 0380ae5 to 195b2cf Compare May 12, 2026 20:21
@simon-iribarren

Copy link
Copy Markdown
Contributor Author

E2E test analysis (run 25748140796 on commit 1c4ffdb)

Investigated the failing e2e tests. None of the failures are caused by the M2 changes in this PR. Summary across all 5 platforms:

Platform Total Passed Failed Skipped Verify suite (M2 cancellation tests) kv-cache category
Desktop Windows 408 405 1 2 3/3 (100%) 17/17 (100%)
Desktop Ubuntu 408 404 2 2 3/3 (100%) 17/17 (100%)
Desktop macOS (M4) 408 405 1 2 3/3 (100%) 17/17 (100%)
iOS (iPhone 16 Pro) 408 280 1 127 3/3 (100%) 17/17 (100%)
Android (Samsung S25 Ultra) 408 77 329 2 0/3 (never ran) 0/17 (never ran)

What actually failed

1. completion-response-format-json-schema — fails on iOS, win, ubuntu, mac.

Model returned prose instead of valid JSON:

Error: json_schema output is not valid JSON: JSON Parse error: Unexpected character: H. Output: Here is the person info as JSON:

Pre-existing model-quality flake on the 1B Llama (the test forces response_format: json_schema and the small model occasionally prefixes prose). Touches none of the M2 surface (cancellation, KvCacheSession, request registry).

2. vision-text-extraction — ubuntu desktop only.

Vision OCR-style test, unrelated to M2. Pre-existing.

3. 329 android failures — root cause is a device hang, not test failures.

Producer log on android shows the consumer device went unresponsive at 21:07 while running transcribe-stream-events-happy:

21:06:39  ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 63s / 180s)
21:07:09  ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 93s / 180s)
21:07:39  ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 123s / 180s)
21:07:39  💀 Consumer mobile-SM unresponsive for 123s — marking as dead
21:07:39  ❌ All consumers are dead. Terminating batch.

After the device died, the producer counted all 327 unrun queued tests as failures (which includes our 3 cancellation tests, the entire 17-test kv-cache suite, embedding, rag, translation, tools, ocr, tts, etc — all 0/N in the per-category report). This is a known device-farm/transcription flakiness pattern (cf. QVAC-18460 for the iOS transcription crash already tracked under skips).

M2 surface verification

Where M2-touched paths actually executed (4 of 5 platforms):

  • Verify suite (the new cancellation e2e: cancel-mid-stream-completion, cancel-before-begin-completion, cancel-then-resume-kv-cache): 3/3 pass on every platform (iOS 317ms / 109ms / 637ms; desktop ~210ms / ~100ms / similar).
  • kv-cache category: 17/17 pass on every platform — the existing kv-cache suite (including kv-cache-cancel-then-new-prompt which exercises broad-cancel backward-compat) is functionally equivalent under the KvCacheSession refactor.
  • completion category: 45/48 pass on every platform (45 + 2 skipped + 1 failed = 48; the one failure is the json-schema flake above, not ours).

Verdict: the e2e failures on this run are entirely pre-existing (model-quality flake, vision flake) or device-hang artefacts. The M2 surface (typed cancel outcomes, InferenceCancelledError, KvCacheSession, stop-button race close) shows 100% pass on every platform that ran it.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (2/1)



---
*This comment is automatically updated when reviews change.*

…ache via KvCacheSession

Builds on QVAC-18181's request lifecycle primitives (DisposableScope,
RequestContext, RequestRegistry) to deliver the M2 milestone:

- Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone`
  events, and `InferenceCancelledError(requestId, partial)` thrown from
  CompletionRun promise-aggregates (`final` / `text` / `toolCalls` /
  `stats`). The wire stream still ends normally so iterating
  `run.events` is unaffected — the typed error lives on the aggregate
  promises that callers `await` for the final result.

- KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/
  kv-cache-session.ts`) — single atomic owner of the three KV-cache
  layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin`
  files). `beginTurn` / `commitTurn` / `rollback` collapse the three
  duplicated cleanup blocks in `completion-stream.ts` into one
  scope.defer hook. Cross-model administrative deletion lives at the
  module level as `deleteKvCacheState(...)`, called by the RPC
  `handleDeleteCache` handler.

- Stop-button race close — `RequestRegistry` now keeps a bounded
  cancelled-before-begin map (128 entries, 30s TTL). A `cancel({
  requestId })` that lands before the server's `begin(...)` ran is
  applied retroactively when begin lands, so same-tick stop clicks no
  longer disappear into the void. Internal-only — the wire surface for
  `cancel` is unchanged (Option A in the brief).

Cursor rules updated in the same PR so the request-lifecycle and
KV-cache topic docs stay in sync with the implementation.

Tests:
- unit: KvCacheSession (bareTest-gated, runs in the Bare consumer),
  RequestRegistry race + bounded-set eviction, completion-event schema
  cancelled cases.
- e2e: cancellation-tests.ts adds three definitions — mid-stream cancel
  (events.stopReason === "cancelled", final rejects with
  InferenceCancelledError, partial.text matches concatenated
  contentDelta), cancel-before-begin (retroactive abort), and
  cancel-then-resume-kv-cache (rollback wiped the three layers, the
  next turn re-primes cleanly).
Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`)
labels from comments and test titles introduced with the typed-cancel
outcomes + KvCacheSession work. The substantive descriptions of the
contracts (Stop-button race, cancelled-before-begin map, three-layer
session ownership, etc.) are preserved; only the planning-doc
references are removed so the code reads cleanly without the pitch
context. Durable `QVAC-XXXXX` ticket references are kept.

No behavior or API surface changes.
Strips QVAC-XXXXX inline ticket references from code/test comments
introduced by the typed-cancel-outcomes work. Concept names
(Stop-button race, cancelled-before-begin, etc.) and prose
descriptions of the contracts are preserved; only the ticket-tag
suffixes go. Also renames a test cache key from
`qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so
the cache key reads as a stable identifier rather than a ticket
reference.

No behavior or API surface changes.
…te concurrency

Address non-blocking review nits on PR tetherto#2007:

- aggregate-events: explain why a wire event carrying both error and
  cancelled signals resolves to error (closes brief open question tetherto#3).
- kv-cache-session: doc-comment on deleteKvCacheState explaining the
  ordering guarantee under concurrent in-flight turns -- delete is
  wire-async, in-flight turns roll back idempotently when their commit
  probe finds the file gone (closes brief open question tetherto#4).

Comments only; no behavior changes.
@simon-iribarren simon-iribarren added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 13, 2026
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — ios⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — windows — ✅ all tests passed (91/91, 445s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — linux — ✅ all tests passed (91/91, 349s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — macos — ✅ all tests passed (91/91, 341s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — android — ✅ all tests passed (83/91, 1996s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

@simon-iribarren

Copy link
Copy Markdown
Contributor Author

/review

@simon-iribarren simon-iribarren merged commit e2bc0e6 into tetherto:main May 13, 2026
12 of 13 checks passed
Proletter pushed a commit that referenced this pull request May 24, 2026
…ache via KvCacheSession (#2007)

* QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession

Builds on QVAC-18181's request lifecycle primitives (DisposableScope,
RequestContext, RequestRegistry) to deliver the M2 milestone:

- Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone`
  events, and `InferenceCancelledError(requestId, partial)` thrown from
  CompletionRun promise-aggregates (`final` / `text` / `toolCalls` /
  `stats`). The wire stream still ends normally so iterating
  `run.events` is unaffected — the typed error lives on the aggregate
  promises that callers `await` for the final result.

- KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/
  kv-cache-session.ts`) — single atomic owner of the three KV-cache
  layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin`
  files). `beginTurn` / `commitTurn` / `rollback` collapse the three
  duplicated cleanup blocks in `completion-stream.ts` into one
  scope.defer hook. Cross-model administrative deletion lives at the
  module level as `deleteKvCacheState(...)`, called by the RPC
  `handleDeleteCache` handler.

- Stop-button race close — `RequestRegistry` now keeps a bounded
  cancelled-before-begin map (128 entries, 30s TTL). A `cancel({
  requestId })` that lands before the server's `begin(...)` ran is
  applied retroactively when begin lands, so same-tick stop clicks no
  longer disappear into the void. Internal-only — the wire surface for
  `cancel` is unchanged (Option A in the brief).

Cursor rules updated in the same PR so the request-lifecycle and
KV-cache topic docs stay in sync with the implementation.

Tests:
- unit: KvCacheSession (bareTest-gated, runs in the Bare consumer),
  RequestRegistry race + bounded-set eviction, completion-event schema
  cancelled cases.
- e2e: cancellation-tests.ts adds three definitions — mid-stream cancel
  (events.stopReason === "cancelled", final rejects with
  InferenceCancelledError, partial.text matches concatenated
  contentDelta), cancel-before-begin (retroactive abort), and
  cancel-then-resume-kv-cache (rollback wiped the three layers, the
  next turn re-primes cleanly).

* chore: drop planning labels (Mx/Dx) from QVAC-18182 comments

Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`)
labels from comments and test titles introduced with the typed-cancel
outcomes + KvCacheSession work. The substantive descriptions of the
contracts (Stop-button race, cancelled-before-begin map, three-layer
session ownership, etc.) are preserved; only the planning-doc
references are removed so the code reads cleanly without the pitch
context. Durable `QVAC-XXXXX` ticket references are kept.

No behavior or API surface changes.

* chore: drop Asana ticket references from QVAC-18182 code comments

Strips QVAC-XXXXX inline ticket references from code/test comments
introduced by the typed-cancel-outcomes work. Concept names
(Stop-button race, cancelled-before-begin, etc.) and prose
descriptions of the contracts are preserved; only the ticket-tag
suffixes go. Also renames a test cache key from
`qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so
the cache key reads as a stable identifier rather than a ticket
reference.

No behavior or API surface changes.

* QVAC-18182 doc: clarify error>cancelled precedence + deleteKvCacheState concurrency

Address non-blocking review nits on PR #2007:

- aggregate-events: explain why a wire event carrying both error and
  cancelled signals resolves to error (closes brief open question #3).
- kv-cache-session: doc-comment on deleteKvCacheState explaining the
  ordering guarantee under concurrent in-flight turns -- delete is
  wire-async, in-flight turns roll back idempotently when their commit
  probe finds the file gone (closes brief open question #4).

Comments only; no behavior changes.

* QVAC-18182 doc: demonstrate typed cancel outcomes in cancel example

Enhance the existing cancel-by-request-id example to demonstrate the
two M2 cancel-outcome channels:

- run.events ends normally with completionDone carrying
  stopReason: "cancelled" -- show reading it inside the iteration loop.
- run.text rejects with InferenceCancelledError(requestId, partial) on
  cancel -- show the instanceof check and consuming partial.text,
  partial.toolCalls, partial.stats.

Also update the header to remove the now-stale "logged as a no-match"
sentence (same-tick cancels are no longer dropped after M2's race
close).

Pure documentation enhancement; no API or behavior changes.

* QVAC-18182 fix: address PR review — partial-prime cleanup + parent-aborted state

Two follow-ups from Opanin's review on PR #2007:

1. KvCacheSession.beginTurn: if `primeIfMissing` throws after the
   addon has partially written a `.bin` to disk, the next
   `beginCustom` would `fsPromises.access(cachePath)` → true and
   trust the half-primed file as a valid cache (no rollback hook is
   registered yet — the handler hasn't seen the `TurnHandle`). Wrap
   both `beginCustom` and `beginAuto` prime calls in a shared
   `primeOrCleanup` helper that best-effort unlinks the partial file
   before re-throwing the original prime error. Adds a bare-only unit
   test asserting the on-disk file is removed and the init flag stays
   unset on the failed-prime path.

2. RequestRegistry.begin: when `parentSignal` was already aborted at
   begin time, line 271 aborts the controller but the `state` ternary
   still landed `"running"`, exactly the "momentarily-running with
   already-aborted signal" the preCancel branch was guarding against.
   Extend the ternary to cover both inputs and the existing
   `parentSignal already aborted` test now also asserts
   `ctx.state === "cancelling"`.

No behavior change on the happy path. Lint + typecheck + 351-test
unit suite green locally on the changed files.

* QVAC-18182 fix: prime is atomic — addon writes to .prime.tmp + atomic rename

Upgrade the previous reactive cleanup workaround (PR #2007 review by
@opaninakuffo) into a proactive atomic-by-construction design:

  - The session steers `model.run({ saveSessionPath })` to a sibling
    `cachePath + ".prime.tmp"` path.
  - Only after the prime closure resolves successfully do we promote
    the temp file to the canonical `cachePath` via `fsPromises.rename`
    (atomic same-volume on every host we target).
  - The canonical cache path is therefore *never* observable in a
    partial state — a thrown prime is indistinguishable on disk from
    a never-attempted prime, so the next existence probe (in-process
    or cross-process worker restart) cannot trust corrupt bytes.

Defensive details:
  - We unlink any leftover `.prime.tmp` *before* invoking the closure,
    so a deferred-write addon path can't accidentally promote
    stale-from-crash bytes left by a prior worker.
  - On prime success we probe the temp path before renaming. If the
    addon deferred its disk write (some llama.cpp paths flush lazily),
    the temp doesn't exist and we leave the canonical path absent —
    `verifySaveAndRecord` in `commitTurn` is the authoritative check.
  - On rename failure we unlink the temp and surface the rename error;
    rename atomicity guarantees the canonical path was untouched.

Why this is better than the prior `primeOrCleanup`:
  - Best-effort `unlink` was load-bearing for correctness in the old
    design — a failed unlink left a half-primed canonical file the
    next `beginCustom` would trust. The new design moves the only
    possible "partial" file to a non-trusted name, so failed cleanup
    cannot corrupt the canonical name by construction.
  - The unit test no longer mocks the workaround surface; it asserts
    the actual invariant ("canonical path was never written") plus
    the positive rename and the leftover-sweep guarantees.

Tests: 3 bare-only kv-cache-session unit tests (throw-leaves-canonical-
untouched, success-promotes-via-rename, leftover-from-crash-is-swept).
Lint + typecheck + 351-test unit suite green locally on the changed files.

Long-term, the right fix is one layer down — the llama.cpp addon should
write transactionally itself and surface save errors instead of
swallowing them. When that lands, this helper collapses to a direct
`prime(cachePath)` call and the `verifySaveAndRecord` access-probe
fallback (TODO already documented) can be retired together. Filed as
a separate follow-up; out of scope for this PR.

* QVAC-18182 fix: replace prime-atomic helper with verifyPrimedFile post-prime probe

Audit of the llama.cpp addon (`CacheManager::writeCacheFile` →
`llama_state_save_file`, return value swallowed; `LlamaModel::
processPromptImpl` lines 575-599) shows the bug shape Opanin flagged
on PR #2007 — "primeIfMissing throws after a partial save" — does not
actually fire. The save call is the very last operation on the
prefill path, the addon ignores its return value, and any earlier
throw means no save was attempted. So:

  - `primeOrCleanup` (`ac8d2d74e`) and the upgrade to
    `primeAtomically` (`a7420f3e6`) defended against a code path that
    the addon does not produce.
  - The real corruption shape is silent partial writes (addon's
    `llama_state_save_file` returns false, addon ignores it, file is
    half-written or empty). Atomic temp+rename did NOT close this
    gap — on a "silent partial" the closure resolves successfully and
    the helper would happily promote the partial `.prime.tmp` to the
    canonical path.

Replace both helpers with a small `verifyPrimedFile` that mirrors the
existing `verifySaveAndRecord` access-probe pattern used at commit
time, applied at prime time:

  - After a successful prime closure, `fsPromises.stat` the canonical
    path. If it doesn't exist (addon was interrupted before save) or
    has size 0 (addon save call produced an empty file), throw and
    best-effort unlink the empty leftover so the next existence probe
    doesn't trust it.
  - This catches the two failure modes Opanin's concern was a proxy
    for (cancelled-mid-prime; addon save quietly produced nothing)
    without claiming defense against partial-but-nonzero writes,
    which can only be closed at the addon layer.

The `RequestRegistry` parent-aborted-state fix (`ctx.state` ternary
covers `opts.parentSignal?.aborted`) from `ac8d2d74e` is preserved
unchanged — it stands on its own as a correct response to Opanin's
second comment.

Long-term root cause stays the addon: have
`CacheManager::writeCacheFile` check `llama_state_save_file`'s return
value and throw on failure. When that lands, both `verifyPrimedFile`
and `verifySaveAndRecord`'s access-probes can be retired together.
Filed as a separate follow-up — out of scope for this PR.

Tests: 3 prior bare-only prime-atomic tests removed; 2 new bare-only
tests added (no-file and empty-file rejection paths). Lint +
typecheck + 330-test unit suite green locally on the changed files
(pre-existing sdcpp-generation lint errors unchanged).

* QVAC-18182 doc: kv-cache rule documents addon non-transactional save + matched access-probes

Extend the "Cache Initialization (primeIfMissing)" section in
.cursor/rules/sdk/docs/kv-cache-system.mdc with the corrected
addon-contract analysis:

  - The llama.cpp addon's CacheManager::writeCacheFile discards
    llama_state_save_file's bool return; maybeSaveCacheToDisk is the
    last call on the prefill path. So no closure-rejection path can
    coexist with a partial file on disk.
  - Document the four real outcomes as a table (interrupted /
    success / silent partial write / pre-eval throw) so future
    readers can see why the SDK takes the shape it does.
  - Pin both SDK-side defenses as a matched pair: verifyPrimedFile
    at prime time (added in this PR) and verifySaveAndRecord at
    commit time (existing). Both are honest about what they catch
    (missing / empty file) and what they don't (partial-but-nonzero,
    only addon fix can close that).
  - Reference the addon-layer follow-up
    (1214778658064488 / "throw on llama_state_save_file failure")
    so the next contributor knows both probes will be retired
    together when the addon throws on save failure.

No code change — rule-only update.
lauripiisang added a commit that referenced this pull request Jun 9, 2026
…tion

Wire up the "length" stop reason that has existed in the schema since
PR #2007 but was never emitted.

- plugin.ts: detect budget exhaustion (generatedTokens >= effectivePredict,
  skipping -1/-2 special values) and emit stopReason="length" in the
  terminal normalizer.finish() call
- aggregate-events.ts: capture stopReason from completionDone events;
  propagate to CompletionFinal
- completion-event.ts: add stopReason?: StopReason to CompletionFinal
- unit tests: 6 new cases covering aggregation and final propagation
- e2e: completion-stop-reason-length test + special executor handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lauripiisang added a commit that referenced this pull request Jun 9, 2026
…tion (#2484)

* QVAC-19778 feat[api]: emit stopReason="length" on token budget exhaustion

Wire up the "length" stop reason that has existed in the schema since
PR #2007 but was never emitted.

- plugin.ts: detect budget exhaustion (generatedTokens >= effectivePredict,
  skipping -1/-2 special values) and emit stopReason="length" in the
  terminal normalizer.finish() call
- aggregate-events.ts: capture stopReason from completionDone events;
  propagate to CompletionFinal
- completion-event.ts: add stopReason?: StopReason to CompletionFinal
- unit tests: 6 new cases covering aggregation and final propagation
- e2e: completion-stop-reason-length test + special executor handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* QVAC-19778 doc[api]: add stopReason examples

completion-events.ts: print final.stopReason when set (was always
read from the event but never from the aggregated final).

completion-stop-reason.ts: new example showing both stopReason
cases — natural EOS (undefined) and predict budget exhaustion
("length") — on the same model back to back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Reviewer has verified safety of packages & github actions in this PR test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] tier1 verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants