QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession#2007
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
0380ae5 to
195b2cf
Compare
E2E test analysis (run
|
| Platform | Total | Passed | Failed | Skipped | Verify suite (M2 cancellation tests) | kv-cache category |
|---|---|---|---|---|---|---|
| Desktop Windows | 408 | 405 | 1 | 2 | 3/3 (100%) | 17/17 (100%) |
| Desktop Ubuntu | 408 | 404 | 2 | 2 | 3/3 (100%) | 17/17 (100%) |
| Desktop macOS (M4) | 408 | 405 | 1 | 2 | 3/3 (100%) | 17/17 (100%) |
| iOS (iPhone 16 Pro) | 408 | 280 | 1 | 127 | 3/3 (100%) | 17/17 (100%) |
| Android (Samsung S25 Ultra) | 408 | 77 | 329 | 2 | 0/3 (never ran) | 0/17 (never ran) |
What actually failed
1. completion-response-format-json-schema — fails on iOS, win, ubuntu, mac.
Model returned prose instead of valid JSON:
Error: json_schema output is not valid JSON: JSON Parse error: Unexpected character: H. Output: Here is the person info as JSON:
Pre-existing model-quality flake on the 1B Llama (the test forces response_format: json_schema and the small model occasionally prefixes prose). Touches none of the M2 surface (cancellation, KvCacheSession, request registry).
2. vision-text-extraction — ubuntu desktop only.
Vision OCR-style test, unrelated to M2. Pre-existing.
3. 329 android failures — root cause is a device hang, not test failures.
Producer log on android shows the consumer device went unresponsive at 21:07 while running transcribe-stream-events-happy:
21:06:39 ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 63s / 180s)
21:07:09 ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 93s / 180s)
21:07:39 ⏳ transcribe-stream-events-happy → consumer-mobile-SM-... (running, 123s / 180s)
21:07:39 💀 Consumer mobile-SM unresponsive for 123s — marking as dead
21:07:39 ❌ All consumers are dead. Terminating batch.
After the device died, the producer counted all 327 unrun queued tests as failures (which includes our 3 cancellation tests, the entire 17-test kv-cache suite, embedding, rag, translation, tools, ocr, tts, etc — all 0/N in the per-category report). This is a known device-farm/transcription flakiness pattern (cf. QVAC-18460 for the iOS transcription crash already tracked under skips).
M2 surface verification
Where M2-touched paths actually executed (4 of 5 platforms):
- Verify suite (the new cancellation e2e:
cancel-mid-stream-completion,cancel-before-begin-completion,cancel-then-resume-kv-cache): 3/3 pass on every platform (iOS 317ms / 109ms / 637ms; desktop ~210ms / ~100ms / similar). - kv-cache category: 17/17 pass on every platform — the existing kv-cache suite (including
kv-cache-cancel-then-new-promptwhich exercises broad-cancel backward-compat) is functionally equivalent under theKvCacheSessionrefactor. - completion category: 45/48 pass on every platform (45 + 2 skipped + 1 failed = 48; the one failure is the json-schema flake above, not ours).
Verdict: the e2e failures on this run are entirely pre-existing (model-quality flake, vision flake) or device-hang artefacts. The M2 surface (typed cancel outcomes, InferenceCancelledError, KvCacheSession, stop-button race close) shows 100% pass on every platform that ran it.
Tier-based Approval Status |
…ache via KvCacheSession
Builds on QVAC-18181's request lifecycle primitives (DisposableScope,
RequestContext, RequestRegistry) to deliver the M2 milestone:
- Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone`
events, and `InferenceCancelledError(requestId, partial)` thrown from
CompletionRun promise-aggregates (`final` / `text` / `toolCalls` /
`stats`). The wire stream still ends normally so iterating
`run.events` is unaffected — the typed error lives on the aggregate
promises that callers `await` for the final result.
- KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/
kv-cache-session.ts`) — single atomic owner of the three KV-cache
layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin`
files). `beginTurn` / `commitTurn` / `rollback` collapse the three
duplicated cleanup blocks in `completion-stream.ts` into one
scope.defer hook. Cross-model administrative deletion lives at the
module level as `deleteKvCacheState(...)`, called by the RPC
`handleDeleteCache` handler.
- Stop-button race close — `RequestRegistry` now keeps a bounded
cancelled-before-begin map (128 entries, 30s TTL). A `cancel({
requestId })` that lands before the server's `begin(...)` ran is
applied retroactively when begin lands, so same-tick stop clicks no
longer disappear into the void. Internal-only — the wire surface for
`cancel` is unchanged (Option A in the brief).
Cursor rules updated in the same PR so the request-lifecycle and
KV-cache topic docs stay in sync with the implementation.
Tests:
- unit: KvCacheSession (bareTest-gated, runs in the Bare consumer),
RequestRegistry race + bounded-set eviction, completion-event schema
cancelled cases.
- e2e: cancellation-tests.ts adds three definitions — mid-stream cancel
(events.stopReason === "cancelled", final rejects with
InferenceCancelledError, partial.text matches concatenated
contentDelta), cancel-before-begin (retroactive abort), and
cancel-then-resume-kv-cache (rollback wiped the three layers, the
next turn re-primes cleanly).
Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`) labels from comments and test titles introduced with the typed-cancel outcomes + KvCacheSession work. The substantive descriptions of the contracts (Stop-button race, cancelled-before-begin map, three-layer session ownership, etc.) are preserved; only the planning-doc references are removed so the code reads cleanly without the pitch context. Durable `QVAC-XXXXX` ticket references are kept. No behavior or API surface changes.
Strips QVAC-XXXXX inline ticket references from code/test comments introduced by the typed-cancel-outcomes work. Concept names (Stop-button race, cancelled-before-begin, etc.) and prose descriptions of the contracts are preserved; only the ticket-tag suffixes go. Also renames a test cache key from `qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so the cache key reads as a stable identifier rather than a ticket reference. No behavior or API surface changes.
…te concurrency Address non-blocking review nits on PR tetherto#2007: - aggregate-events: explain why a wire event carrying both error and cancelled signals resolves to error (closes brief open question tetherto#3). - kv-cache-session: doc-comment on deleteKvCacheState explaining the ordering guarantee under concurrent in-flight turns -- delete is wire-async, in-flight turns roll back idempotently when their commit probe finds the file gone (closes brief open question tetherto#4). Comments only; no behavior changes.
QVAC E2E —
|
QVAC E2E —
|
|
/review |
…ache via KvCacheSession (#2007) * QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession Builds on QVAC-18181's request lifecycle primitives (DisposableScope, RequestContext, RequestRegistry) to deliver the M2 milestone: - Typed cancel outcomes: `stopReason: "cancelled"` on `completionDone` events, and `InferenceCancelledError(requestId, partial)` thrown from CompletionRun promise-aggregates (`final` / `text` / `toolCalls` / `stats`). The wire stream still ends normally so iterating `run.events` is unaffected — the typed error lives on the aggregate promises that callers `await` for the final result. - KvCacheSession (`server/bare/plugins/llamacpp-completion/ops/ kv-cache-session.ts`) — single atomic owner of the three KV-cache layers (`cachedMessageCounts`, `initializedCaches`, on-disk `.bin` files). `beginTurn` / `commitTurn` / `rollback` collapse the three duplicated cleanup blocks in `completion-stream.ts` into one scope.defer hook. Cross-model administrative deletion lives at the module level as `deleteKvCacheState(...)`, called by the RPC `handleDeleteCache` handler. - Stop-button race close — `RequestRegistry` now keeps a bounded cancelled-before-begin map (128 entries, 30s TTL). A `cancel({ requestId })` that lands before the server's `begin(...)` ran is applied retroactively when begin lands, so same-tick stop clicks no longer disappear into the void. Internal-only — the wire surface for `cancel` is unchanged (Option A in the brief). Cursor rules updated in the same PR so the request-lifecycle and KV-cache topic docs stay in sync with the implementation. Tests: - unit: KvCacheSession (bareTest-gated, runs in the Bare consumer), RequestRegistry race + bounded-set eviction, completion-event schema cancelled cases. - e2e: cancellation-tests.ts adds three definitions — mid-stream cancel (events.stopReason === "cancelled", final rejects with InferenceCancelledError, partial.text matches concatenated contentDelta), cancel-before-begin (retroactive abort), and cancel-then-resume-kv-cache (rollback wiped the three layers, the next turn re-primes cleanly). * chore: drop planning labels (Mx/Dx) from QVAC-18182 comments Strips milestone (`M1`/`M2`/`M3a`...) and deliverable (`D2`/`D5`/`D7`) labels from comments and test titles introduced with the typed-cancel outcomes + KvCacheSession work. The substantive descriptions of the contracts (Stop-button race, cancelled-before-begin map, three-layer session ownership, etc.) are preserved; only the planning-doc references are removed so the code reads cleanly without the pitch context. Durable `QVAC-XXXXX` ticket references are kept. No behavior or API surface changes. * chore: drop Asana ticket references from QVAC-18182 code comments Strips QVAC-XXXXX inline ticket references from code/test comments introduced by the typed-cancel-outcomes work. Concept names (Stop-button race, cancelled-before-begin, etc.) and prose descriptions of the contracts are preserved; only the ticket-tag suffixes go. Also renames a test cache key from `qvac-18182-cancel-resume-kvcache` to `cancel-then-resume-kvcache` so the cache key reads as a stable identifier rather than a ticket reference. No behavior or API surface changes. * QVAC-18182 doc: clarify error>cancelled precedence + deleteKvCacheState concurrency Address non-blocking review nits on PR #2007: - aggregate-events: explain why a wire event carrying both error and cancelled signals resolves to error (closes brief open question #3). - kv-cache-session: doc-comment on deleteKvCacheState explaining the ordering guarantee under concurrent in-flight turns -- delete is wire-async, in-flight turns roll back idempotently when their commit probe finds the file gone (closes brief open question #4). Comments only; no behavior changes. * QVAC-18182 doc: demonstrate typed cancel outcomes in cancel example Enhance the existing cancel-by-request-id example to demonstrate the two M2 cancel-outcome channels: - run.events ends normally with completionDone carrying stopReason: "cancelled" -- show reading it inside the iteration loop. - run.text rejects with InferenceCancelledError(requestId, partial) on cancel -- show the instanceof check and consuming partial.text, partial.toolCalls, partial.stats. Also update the header to remove the now-stale "logged as a no-match" sentence (same-tick cancels are no longer dropped after M2's race close). Pure documentation enhancement; no API or behavior changes. * QVAC-18182 fix: address PR review — partial-prime cleanup + parent-aborted state Two follow-ups from Opanin's review on PR #2007: 1. KvCacheSession.beginTurn: if `primeIfMissing` throws after the addon has partially written a `.bin` to disk, the next `beginCustom` would `fsPromises.access(cachePath)` → true and trust the half-primed file as a valid cache (no rollback hook is registered yet — the handler hasn't seen the `TurnHandle`). Wrap both `beginCustom` and `beginAuto` prime calls in a shared `primeOrCleanup` helper that best-effort unlinks the partial file before re-throwing the original prime error. Adds a bare-only unit test asserting the on-disk file is removed and the init flag stays unset on the failed-prime path. 2. RequestRegistry.begin: when `parentSignal` was already aborted at begin time, line 271 aborts the controller but the `state` ternary still landed `"running"`, exactly the "momentarily-running with already-aborted signal" the preCancel branch was guarding against. Extend the ternary to cover both inputs and the existing `parentSignal already aborted` test now also asserts `ctx.state === "cancelling"`. No behavior change on the happy path. Lint + typecheck + 351-test unit suite green locally on the changed files. * QVAC-18182 fix: prime is atomic — addon writes to .prime.tmp + atomic rename Upgrade the previous reactive cleanup workaround (PR #2007 review by @opaninakuffo) into a proactive atomic-by-construction design: - The session steers `model.run({ saveSessionPath })` to a sibling `cachePath + ".prime.tmp"` path. - Only after the prime closure resolves successfully do we promote the temp file to the canonical `cachePath` via `fsPromises.rename` (atomic same-volume on every host we target). - The canonical cache path is therefore *never* observable in a partial state — a thrown prime is indistinguishable on disk from a never-attempted prime, so the next existence probe (in-process or cross-process worker restart) cannot trust corrupt bytes. Defensive details: - We unlink any leftover `.prime.tmp` *before* invoking the closure, so a deferred-write addon path can't accidentally promote stale-from-crash bytes left by a prior worker. - On prime success we probe the temp path before renaming. If the addon deferred its disk write (some llama.cpp paths flush lazily), the temp doesn't exist and we leave the canonical path absent — `verifySaveAndRecord` in `commitTurn` is the authoritative check. - On rename failure we unlink the temp and surface the rename error; rename atomicity guarantees the canonical path was untouched. Why this is better than the prior `primeOrCleanup`: - Best-effort `unlink` was load-bearing for correctness in the old design — a failed unlink left a half-primed canonical file the next `beginCustom` would trust. The new design moves the only possible "partial" file to a non-trusted name, so failed cleanup cannot corrupt the canonical name by construction. - The unit test no longer mocks the workaround surface; it asserts the actual invariant ("canonical path was never written") plus the positive rename and the leftover-sweep guarantees. Tests: 3 bare-only kv-cache-session unit tests (throw-leaves-canonical- untouched, success-promotes-via-rename, leftover-from-crash-is-swept). Lint + typecheck + 351-test unit suite green locally on the changed files. Long-term, the right fix is one layer down — the llama.cpp addon should write transactionally itself and surface save errors instead of swallowing them. When that lands, this helper collapses to a direct `prime(cachePath)` call and the `verifySaveAndRecord` access-probe fallback (TODO already documented) can be retired together. Filed as a separate follow-up; out of scope for this PR. * QVAC-18182 fix: replace prime-atomic helper with verifyPrimedFile post-prime probe Audit of the llama.cpp addon (`CacheManager::writeCacheFile` → `llama_state_save_file`, return value swallowed; `LlamaModel:: processPromptImpl` lines 575-599) shows the bug shape Opanin flagged on PR #2007 — "primeIfMissing throws after a partial save" — does not actually fire. The save call is the very last operation on the prefill path, the addon ignores its return value, and any earlier throw means no save was attempted. So: - `primeOrCleanup` (`ac8d2d74e`) and the upgrade to `primeAtomically` (`a7420f3e6`) defended against a code path that the addon does not produce. - The real corruption shape is silent partial writes (addon's `llama_state_save_file` returns false, addon ignores it, file is half-written or empty). Atomic temp+rename did NOT close this gap — on a "silent partial" the closure resolves successfully and the helper would happily promote the partial `.prime.tmp` to the canonical path. Replace both helpers with a small `verifyPrimedFile` that mirrors the existing `verifySaveAndRecord` access-probe pattern used at commit time, applied at prime time: - After a successful prime closure, `fsPromises.stat` the canonical path. If it doesn't exist (addon was interrupted before save) or has size 0 (addon save call produced an empty file), throw and best-effort unlink the empty leftover so the next existence probe doesn't trust it. - This catches the two failure modes Opanin's concern was a proxy for (cancelled-mid-prime; addon save quietly produced nothing) without claiming defense against partial-but-nonzero writes, which can only be closed at the addon layer. The `RequestRegistry` parent-aborted-state fix (`ctx.state` ternary covers `opts.parentSignal?.aborted`) from `ac8d2d74e` is preserved unchanged — it stands on its own as a correct response to Opanin's second comment. Long-term root cause stays the addon: have `CacheManager::writeCacheFile` check `llama_state_save_file`'s return value and throw on failure. When that lands, both `verifyPrimedFile` and `verifySaveAndRecord`'s access-probes can be retired together. Filed as a separate follow-up — out of scope for this PR. Tests: 3 prior bare-only prime-atomic tests removed; 2 new bare-only tests added (no-file and empty-file rejection paths). Lint + typecheck + 330-test unit suite green locally on the changed files (pre-existing sdcpp-generation lint errors unchanged). * QVAC-18182 doc: kv-cache rule documents addon non-transactional save + matched access-probes Extend the "Cache Initialization (primeIfMissing)" section in .cursor/rules/sdk/docs/kv-cache-system.mdc with the corrected addon-contract analysis: - The llama.cpp addon's CacheManager::writeCacheFile discards llama_state_save_file's bool return; maybeSaveCacheToDisk is the last call on the prefill path. So no closure-rejection path can coexist with a partial file on disk. - Document the four real outcomes as a table (interrupted / success / silent partial write / pre-eval throw) so future readers can see why the SDK takes the shape it does. - Pin both SDK-side defenses as a matched pair: verifyPrimedFile at prime time (added in this PR) and verifySaveAndRecord at commit time (existing). Both are honest about what they catch (missing / empty file) and what they don't (partial-but-nonzero, only addon fix can close that). - Reference the addon-layer follow-up (1214778658064488 / "throw on llama_state_save_file failure") so the next contributor knows both probes will be retired together when the addon throws on save failure. No code change — rule-only update.
…tion Wire up the "length" stop reason that has existed in the schema since PR #2007 but was never emitted. - plugin.ts: detect budget exhaustion (generatedTokens >= effectivePredict, skipping -1/-2 special values) and emit stopReason="length" in the terminal normalizer.finish() call - aggregate-events.ts: capture stopReason from completionDone events; propagate to CompletionFinal - completion-event.ts: add stopReason?: StopReason to CompletionFinal - unit tests: 6 new cases covering aggregation and final propagation - e2e: completion-stop-reason-length test + special executor handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion (#2484) * QVAC-19778 feat[api]: emit stopReason="length" on token budget exhaustion Wire up the "length" stop reason that has existed in the schema since PR #2007 but was never emitted. - plugin.ts: detect budget exhaustion (generatedTokens >= effectivePredict, skipping -1/-2 special values) and emit stopReason="length" in the terminal normalizer.finish() call - aggregate-events.ts: capture stopReason from completionDone events; propagate to CompletionFinal - completion-event.ts: add stopReason?: StopReason to CompletionFinal - unit tests: 6 new cases covering aggregation and final propagation - e2e: completion-stop-reason-length test + special executor handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * QVAC-19778 doc[api]: add stopReason examples completion-events.ts: print final.stopReason when set (was always read from the event but never from the aggregated final). completion-stop-reason.ts: new example showing both stopReason cases — natural EOS (undefined) and predict budget exhaustion ("length") — on the same model back to back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
🎯 What problem does this PR solve?
completionDoneand the client'sfinal/text/toolCalls/statspromises resolved with empty/partial values — indistinguishable from a real EOS. Callers couldn't tell "you cancelled this" from "the model returned nothing".cachedMessageCounts(in-memory map),initializedCaches(in-memory set), and the on-disk.binfile each had their own cleanup site.completion-stream.tsrepeated the samefsPromises.unlink(...)+clearCacheRegistry(...)+cachedMessageCounts.delete(...)triplet in three places (cancelled-custom-key turn, cancelled-auto-cache turn, auto-cache rename failure). A miss in any one of them left observable cross-turn corruption.RequestRegistryonly matchedcancel({ requestId })against in-flight requests, so a cancel that landed before the worker calledregistry.begin(...)silently disappeared. The eventualbeginran without an aborted signal and the cancelled request streamed to completion.📝 How does it solve it?
stopReason: "cancelled"joins"eos" | "length" | "stopSequence"on thecompletionDonewire event so the events stream still terminates cleanly. The client-side aggregator inclient/api/completion-stream.tsdetects the cancelled terminator and rejects the promise-aggregates (final,text,toolCalls,stats) with a newInferenceCancelledError(requestId, partial).partial.text/partial.toolCalls/partial.statscarry whatever the aggregator accumulated up to the cancel point — callersinstanceof-check and reach for whichever partial field they need. The error class is defined server-side (utils/errors-server.ts, code52419), constructed client-side (so the partial payload survives the RPC envelope), and re-exported from@qvac/sdkforinstanceofchecks.KvCacheSessionas the atomic three-layer owner (D4). Newserver/bare/plugins/llamacpp-completion/ops/kv-cache-session.tsownscachedMessageCounts,initializedCaches, and.binfile lifecycle behind abeginTurn/commitTurn/rollbackAPI.completion-stream.tsregistersscope.defer(() => session.rollback(turn))at the top of a turn and callssession.commitTurn(...)only on the success path — so cancel / addon failure / auto-cache rename failure / scope unwind all funnel through the same rollback. The three duplicated cleanup blocks in the pre-PRcompletion-stream.tsare gone; the file dropped from ~756 lines to ~661 with the cancellation logic centralised.deleteKvCacheState({ kvCacheKey?, modelId?, all? })inkv-cache-session.tshandles wipe-by-key (optionally scoped to a model) and wipe-all, touching all three layers atomically. The RPC handlerserver/rpc/handlers/delete-cache.tsnow delegates straight to it — no more directfsPromises/ module-set juggling in the handler.cancelledBeforeBeginmap closes the race (D6, Option A).RequestRegistrykeeps a 128-entry / 30 s TTL map ofrequestId → { at, reason }.cancel(...)records into the map when no in-flight context is found;begin(...)consults it and aborts the freshly-minted controller before returning theManagedRequestContext. The wire surface forcancelis unchanged — internal-only fix, no[bc]. Eviction is size-cap + TTL combined.server/bare/runtime/request-registry.tsbegin()now derives terminalstate("cancelled" | "completed") fromctx.signal.abortedso a same-tick cancel-then-begin pair lands asstate: "cancelled", observable via__requestRegistryTestHooksin unit tests..cursor/rules/sdk/request-lifecycle-primitives.mdc,.cursor/rules/sdk/docs/request-lifecycle-system.mdc,.cursor/rules/sdk/docs/kv-cache-system.mdc,.cursor/rules/sdk/error-handling.mdc— error codes table, cancelled completion semantics, KvCacheSession ownership, FAQ entry for the race close, anti-patterns.🧪 How was it tested?
Functional-equivalence vs. pre-PR
completion-stream.ts(the D5/D7 mitigation listed in the brief):completion-stream.tslines ~595–607 (unlink +clearCacheRegistry+cachedMessageCounts.delete)KvCacheSession.rollback(turn)viascope.defer. Verified bycancel-then-resume-kv-cachee2e +kv-cache-session.test.ts"rollback wipes all three layers atomically".shouldRecordSavedCountcommitTurnis never called, so the deferred rollback runs unconditionally.kvCache: true), cancelled mid-streamcachedMessageCounts.delete)KvCacheSession.rollback(turn)viascope.defer.commitTurncachedMessageCounts.delete)commitTurnreturns false / throws → deferred rollback wipes everything (kv-cache-session.test.ts"commitTurn rolls back if the addon did not persist the file").cachedMessageCounts.delete(cachePathToUse)+recordCacheSaveCount(...)commitTurn(turn, newPath, savedCount)records the new path/count and drops the rollback hook in one transaction.Unit tests (
packages/sdk/test/unit/):runtime/request-registry.test.ts— same-tick cancel-before-begin now retroactively aborts (was:0matched, now: signal aborted on the new context); UUID-retry path tested; bounded-set eviction tested (200 cancel-before-begin calls don't grow the internal map past 128 entries; eviction preserves the freshest entries).runtime/kv-cache-session.test.ts—beginTurnpriming + reuse,commitTurnrecords new count + suppresses rollback,rollbackwipes all three layers, rollback tolerates missing on-disk file, double-rollback is idempotent,dropStaleSavedCountforgets count without touching file/init flag,deleteKvCacheState({ kvCacheKey })+deleteKvCacheState({ all })wipe correctly,commitTurntriggers rollback when the addon doesn't persist. Tests arebareTest-gated (dynamic import, runs in the Bare consumer; skipped under Bun becausekv-cache-session.tstransitively importsbare-fs/bare-path).completion-event-schemas.test.ts—successDoneSchema.parse({ type: "completionDone", seq, stopReason: "cancelled" })accepted; cancelled cases added to the event-type routing test.kv-cache-cancel.test.ts— refactored away from the deprecated module-scopedcachedMessageCountsmap;decideCachedHistorySliceis now exercised purely with the explicitsavedCountparameter (matching the M2 API).E2E tests (
packages/sdk/tests-qvac/tests/cancellation-tests.ts+shared/executors/cancellation-executor.ts):cancel-mid-stream-completion— firecancel({ requestId })after the thirdcontentDelta, assert (a) events end withcompletionDone.stopReason === "cancelled", (b)run.finalrejects withInferenceCancelledError, (c)error.partial.textequals the concatenatedcontentDelta.textthe test observed onrun.events.cancel-before-begin-completion— firecancel({ requestId })synchronously aftercompletion(...)returns (before thecompletionStreamRPC settles), assert the eventual outcome is a typedcancelledregardless of whether the cancel landed before or afterbegin(...)ran on the worker.cancel-then-resume-kv-cache— same as the existingkv-cache-cancel-then-new-promptbut additionally asserts the first turn'sInferenceCancelledErrorrejection and thestopReason: "cancelled"event, then verifies the second turn on the samekvCachekey re-primes cleanly and produces the expected answer (provingKvCacheSession.rollbackwiped the three layers).Pre-merge gates (run from
packages/sdk/):🔌 API Changes
stopReason: "cancelled"is a new value on thesuccessDoneSchemaenum;InferenceCancelledErroris a new export. Both are additive — existing consumers ignoring the new enum value or not catching the new error class still work.The internal
RequestRegistrycancel-before-begin race close is wire-invisible —cancel({ requestId })andcancel({ operation: "inference", modelId })behave identically on the wire as before. Same-tick stop clicks just stop being silently dropped.