QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider#2408
Merged
simon-iribarren merged 22 commits intoJun 10, 2026
Merged
Conversation
Add `mode: 'managed'` so the provider can synthesize an ephemeral qvac.config.json from a model-constant list, spawn and supervise `qvac serve` on a free port, and tear it down on host exit. External mode is unchanged and stays synchronous; the managed supervisor is lazily dynamic-imported so external-mode users pay no startup cost. @qvac/cli becomes an optional peer dependency.
…json (QVAC-19900)
The published @qvac/cli ships a string `exports` field ("./dist/index.js"),
which makes the `./package.json` subpath non-resolvable
(ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving
`@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI
on a clean install. Fall back to resolving the package main entry, which for
@qvac/cli is the same file as the `qvac` bin.
opaninakuffo
reviewed
Jun 4, 2026
opaninakuffo
reviewed
Jun 4, 2026
opaninakuffo
reviewed
Jun 4, 2026
…sdk-provider-managed-mode
Managed mode `models` now accepts spec objects ({ name, config, preload,
default }) alongside bare constant names, so callers can set per-model serve
options — notably `ctx_size` and `reasoning_budget` — that coding agents like
OpenCode require. The synthesized qvac.config.json carries the config block,
honors explicit `preload`/`default`, and validates names inside spec objects.
Exports the new `QvacManagedModel` type and documents per-model config plus a
managed-mode OpenCode example in the README.
Rework managed mode from a per-provider supervisor into a shared, self-cleaning serve daemon so it is robust standalone and usable by any tool, not just a single session. - Reuse via a fleet key (model set + per-model config + host) keyed in a cross-process registry under ~/.qvac/managed-serves/; createQvac attaches to a matching healthy serve instead of cold-starting a duplicate. - A detached runner owns the qvac serve child and reaps it once no consumer process has been alive for serveIdleTimeout (default 5m). Liveness, not request traffic, is the signal, so it works for tools that hit baseURL directly (OpenCode/Cline/Aider). - close() now detaches (deregisters the consumer) instead of killing; a shared serve survives until its last user is gone. - Sweep only reaps dead/orphaned serves, never a healthy serve a live process owns (fixes a second session SIGKILLing a downloading serve). - Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED. - reuse:false (or a pinned servePort) yields a private serve reaped as soon as its owner exits. Refactor into serve-process.ts (spawn/health/stop), registry.ts, fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap end-to-end coverage; document the shared lifecycle in the README.
Each managed model maps to a single serve alias keyed by its name, so a repeated name silently overwrote the earlier entry — and could drop its `default: true`. Reject duplicates up front with DuplicateManagedModelError instead of resolving them ambiguously. Addresses PR review feedback.
- Per-instance consumer markers (<pid>.<rand>) so two providers in one process sharing a fleet key don't deregister each other on close (A). - Restrict respawn retry to ECONNREFUSED so an in-flight completion is never blindly replayed on ECONNRESET/EPIPE (C). - Health-check the recorded baseURL before SIGTERM-ing an orphaned serve, guarding against killing a recycled pid (D). - Use dirname() instead of a posix-only regex for ephemeral config cleanup (E). - Fold serveBinPath into the fleet key so distinct local builds don't share a serve (G). - Export managed error classes + QvacManagedErrorCode for instanceof checks (H). - Reject more than one explicit default: true (I). - Deregister the consumer if resolveServe throws (F); drop dead firstConsumerPid runner param (J). Tests: per-instance markers, health-gated orphan sweep (kills serving orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity, multiple-default rejection. README updated.
Lifecycle correctness: - Spawn lock: steal only when the owner pid is dead (with an mtime fallback for an unreadable lock), so a legitimate multi-minute cold start no longer loses its lock after 30s and spawns a duplicate runner/serve (tetherto#1). - close(): the fetch path now bails out instead of re-resolving once closed, so a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3). - sweepServes: when an orphaned serve's pid is alive but its health check fails, keep the record instead of dropping it — dropping stranded a live serve with no registry trace. We only reap once it answers as ours, or drop once its pid dies (tetherto#4). - servePort: fold a pinned port into the fleet key so pinned-port callers don't reuse an auto-allocated serve on a different port, and distinct pins don't collide (tetherto#5). - Respawn: expose baseURL/port/pid as getters over live state, updated on every reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6). - retargetUrl now handles Request inputs (not just string/URL) so a respawn stays transparent if the SDK ever switches input shapes (tetherto#8). Docs: - README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend liveness; document the long-lived-sentinel/wrapper pattern and fix the misleading "the script doesn't have to stay running" note (tetherto#2). - Reconcile version wording: README/changelog now describe managed mode as unreleased (package is 0.1.0); docs-site integration page documents managed mode + the async overload (tetherto#7). Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the runner-dead + serve-alive + health-failing sweep case. Build + suite green (60 pass / 1 integration skip).
lauripiisang
reviewed
Jun 9, 2026
…naged mode Add `models.qvacCatalog`, a public models.dev-style catalog that maps friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads (`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev resolves end-to-end with no translation layer in front of the serve. Managed mode now accepts catalog ids as model names: the synthesized serve config keys the alias by the friendly id while `model` resolves to the underlying SDK constant, so the serve answers `qwen3.5-9b` directly. Bare SDK constants keep working unchanged. A drift unit test fails CI if any catalog constant disappears from the generated SDK catalog.
lauripiisang
reviewed
Jun 9, 2026
lauripiisang
left a comment
Contributor
There was a problem hiding this comment.
few notes that probably should be addressed.
Also naming isn't super obvious re: const re= await reresolve()
Harden managed-mode lifecycle so a managed serve never leaks its `bare` inference worker or outlives the process that owns it. - Process-group teardown: spawn `qvac serve` detached (its own group) and, when stopServe must escalate past the grace window, SIGKILL the whole group. A plain SIGKILL of the serve pid never cascades to the grandchild bare worker, so previously a wedged serve orphaned the worker. The graceful SIGTERM is still sent to the serve process only, so a healthy serve orchestrates its own shutdown and releases the global worker lock (no stale lock left behind); the group SIGKILL is the wedged-path fallback. - `closeOnParentExit` option: for a daemon-style host whose sole job is to keep a managed serve alive for a parent process (e.g. an editor/agent plugin). The provider watches its parent pid and, the moment the parent exits (on POSIX we are reparented to init, ppid → 1), closes itself — deregistering the consumer so the runner reaps the serve — and exits. Without it a hard-killed parent would leave a reparented host alive, keeping its consumer marker forever so the serve was never reaped. Tests: a stubborn-grandchild fake serve proves group teardown reaps the worker; `parentIsGone` unit-tests the parent-watch decision.
…ce and crash-respawn - Undo the consumer re-registration when close() wins the race against an in-flight fetch retry: resolveServe re-adds the marker after close() removed it, which would keep the shared serve warm until the process exits. - Preserve live consumer markers when sweepServes reaps a crashed/orphaned serve, so a respawned runner inherits the still-alive sessions instead of idle-reaping the fresh serve out from under them. - docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.
opaninakuffo
previously approved these changes
Jun 9, 2026
lauripiisang
reviewed
Jun 9, 2026
lauripiisang
reviewed
Jun 9, 2026
removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a confusing sync/async mirror: the async removeConsumer was only ever called right after the sync one (a guaranteed no-op), and the removeRecord pair was really two teardown semantics under near-identical names. Marker/record teardown is a single unlink/rm, cheap enough to be synchronous everywhere — including process 'exit' handlers where async can't run — so collapse each pair into one sync function. No behaviour change; addresses review feedback on tetherto#2408.
Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent (why sync, preserveConsumers semantics) without the narration.
arun-mani-j
reviewed
Jun 10, 2026
Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the resolved CLI path verbatim) and ephemeralConfigName was an unused helper (writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the latter also drops the now-unused randomBytes import.
arun-mani-j
approved these changes
Jun 10, 2026
lauripiisang
approved these changes
Jun 10, 2026
opaninakuffo
approved these changes
Jun 10, 2026
Contributor
Author
|
/review |
Contributor
Tier-based Approval Status |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
@qvac/ai-sdk-providerv1 (0.1.0) is external-mode only: the user must hand-author aqvac.config.jsonand keepqvac serve openairunning in a separate terminal before the provider works. This PR adds managed mode (QVAC-19900) so the provider runsqvac serveitself.The hard part isn't spawning a serve — it's the lifecycle once you do it for real:
qvac serverunning forever — we hit exactly this in testing) and duplicates them (every session/tool reloads the model into memory).baseURL, so any "is it still in use?" heuristic based on the provider's own request traffic is blind to them.ctx_sizeof 1024 is far too small for an agent's tool-laden system prompt, and reasoning models spam<think>blocks.So managed mode is built as a shared, self-cleaning serve daemon that is robust standalone and usable by any tool, not just one in-process session. External mode is left byte-for-byte unchanged.
📝 How does it solve it?
createQvac({ mode: 'managed', models: [...] })synthesizes an ephemeralqvac.config.jsonfrom SDK model-constant names and brings up a serve — but the serve is owned by a detached runner, not by your process:src/managed/fleet-key.ts): a hash of{ model set + per-model config + host }keys a cross-process registry under~/.qvac/managed-serves/.createQvacattaches to a matching healthy serve instead of cold-starting a duplicate. Different models/config ⇒ different key ⇒ its own serve.src/managed/runner.ts): it spawnsqvac serve, publishes the registry record, and reaps the serve once no consumer process has been alive forserveIdleTimeout(default 5 min). Liveness — not request traffic — is the signal, so it works for tools that hitbaseURLdirectly. This survives your script exiting, so a serve can be shared and still cleaned up.close()detaches, it doesn't kill: it deregisters this session as a consumer. A serve still in use by another session keeps running; an unused one is reaped after the idle timeout. Abrupt exits (Ctrl-C/crash) are handled by dead-PID pruning.src/managed/registry.ts): every start reaps only dead/orphaned serves — never a healthy serve a live runner owns. This fixes the original bug where a second session wouldSIGKILLa serve another session was still downloading into.fetchwrapper transparently re-resolves (reattach or spawn) and retries once onECONNREFUSED, so a serve crash mid-session self-heals.QvacManagedModel):modelsentries may be{ name, config, preload, default }, so callers setctx_size/reasoning_budgetper model — the agent-friendly setup. Duplicate model names are rejected up front (DuplicateManagedModelError) instead of silently overwriting.reuse: false(or pinningservePort) gives a dedicated, non-shared serve reaped as soon as its owner exits.src/managed/serve-process.tsholds the low-level spawn / health-poll / SIGTERM→grace→SIGKILL primitives used by the runner.@qvac/cliis an optional peer dependency, resolved viaprocess.execPath+ its entry (or an explicitserveBinPath). The whole managed subsystem is lazily dynamic-imported only whenmode: 'managed'is set, so external mode stays synchronous and pays nothing. Portablenode:APIs only — identical on Node 20+ and Bun.🧪 How was it tested?
bun run lint(tsc),bun run build,bun run test— green: 51 pass / 1 skip (the skip is the real-model integration test, gated byQVAC_INTEGRATION_TEST=1).managed-registry— record round-trip, consumer markers + dead-PID pruning,findReusableServehealth gating, and a non-destructive sweep that drops dead records, kills runner-orphaned serves, and leaves a healthy owned serve untouched.managed-fleet-key— stable, order-insensitive key; changes with per-model config and host.managed-runner— pure idle-reap decision (consumer liveness + timeout, incl. zero-timeout private serves).managed-serve-process— fake-serve driven: healthy, start-timeout, early-exit/crash, SIGKILL-escalation.managed-provider— end-to-end through the real detached runner: auto-spawn, reuse (secondcreateQvac→ same serve pid), and idle-reap (serve gone + registry cleared after the timeout).managed-config— per-model config emission + duplicate-name rejection.tasksharness): healthy spawn → real/v1/chat/completions→ second session reused the same pid →close()left it running while a consumer remained → idle-reaped (process gone + registry empty) once the last consumer left.🔌 API Changes
New
mode: 'managed'overload ofcreateQvac(returns aPromise<ManagedQvacProvider>), plus theQvacManagedOptions/QvacManagedModel/ManagedQvacProvidertypes. ExternalcreateQvacis unchanged.QvacManagedOptions(all optional besidesmode/models):servePort,serveHost,serveStartTimeout,serveBinPath,reuse(defaulttrue;false⇒ private serve),serveIdleTimeout(default300000), plus the sharedapiKey/headers/fetch.QvacManagedModel:{ name, config?, preload?, default? }.ManagedQvacProviderexposesclose(),[Symbol.asyncDispose],port,pid,baseURL.Follow-up (separate task): version bump to
0.2.0+ publish via the fork → release branch → main flow.