QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider by simon-iribarren · Pull Request #2408 · tetherto/qvac

simon-iribarren · 2026-06-03T09:47:00Z

🎯 What problem does this PR solve?

@qvac/ai-sdk-provider v1 (0.1.0) is external-mode only: the user must hand-author a qvac.config.json and keep qvac serve openai running in a separate terminal before the provider works. This PR adds managed mode (QVAC-19900) so the provider runs qvac serve itself.

The hard part isn't spawning a serve — it's the lifecycle once you do it for real:

A naive "spawn on start, kill on close" design orphans serves (a hard crash leaves qvac serve running forever — we hit exactly this in testing) and duplicates them (every session/tool reloads the model into memory).
Coding-agent harnesses (OpenCode, Cline, Aider) connect straight to the baseURL, so any "is it still in use?" heuristic based on the provider's own request traffic is blind to them.
The serve default ctx_size of 1024 is far too small for an agent's tool-laden system prompt, and reasoning models spam <think> blocks.

So managed mode is built as a shared, self-cleaning serve daemon that is robust standalone and usable by any tool, not just one in-process session. External mode is left byte-for-byte unchanged.

📝 How does it solve it?

createQvac({ mode: 'managed', models: [...] }) synthesizes an ephemeral qvac.config.json from SDK model-constant names and brings up a serve — but the serve is owned by a detached runner, not by your process:

Reuse via a fleet key (src/managed/fleet-key.ts): a hash of { model set + per-model config + host } keys a cross-process registry under ~/.qvac/managed-serves/. createQvac attaches to a matching healthy serve instead of cold-starting a duplicate. Different models/config ⇒ different key ⇒ its own serve.
Detached runner owns the serve (src/managed/runner.ts): it spawns qvac serve, publishes the registry record, and reaps the serve once no consumer process has been alive for serveIdleTimeout (default 5 min). Liveness — not request traffic — is the signal, so it works for tools that hit baseURL directly. This survives your script exiting, so a serve can be shared and still cleaned up.
close() detaches, it doesn't kill: it deregisters this session as a consumer. A serve still in use by another session keeps running; an unused one is reaped after the idle timeout. Abrupt exits (Ctrl-C/crash) are handled by dead-PID pruning.
Non-destructive sweep (src/managed/registry.ts): every start reaps only dead/orphaned serves — never a healthy serve a live runner owns. This fixes the original bug where a second session would SIGKILL a serve another session was still downloading into.
Respawn-on-failure: the provider's fetch wrapper transparently re-resolves (reattach or spawn) and retries once on ECONNREFUSED, so a serve crash mid-session self-heals.
Per-model config (QvacManagedModel): models entries may be { name, config, preload, default }, so callers set ctx_size / reasoning_budget per model — the agent-friendly setup. Duplicate model names are rejected up front (DuplicateManagedModelError) instead of silently overwriting.
Private serves: reuse: false (or pinning servePort) gives a dedicated, non-shared serve reaped as soon as its owner exits.

src/managed/serve-process.ts holds the low-level spawn / health-poll / SIGTERM→grace→SIGKILL primitives used by the runner. @qvac/cli is an optional peer dependency, resolved via process.execPath + its entry (or an explicit serveBinPath). The whole managed subsystem is lazily dynamic-imported only when mode: 'managed' is set, so external mode stays synchronous and pays nothing. Portable node: APIs only — identical on Node 20+ and Bun.

🧪 How was it tested?

bun run lint (tsc), bun run build, bun run test — green: 51 pass / 1 skip (the skip is the real-model integration test, gated by QVAC_INTEGRATION_TEST=1).
New/updated unit + integration suites:
- managed-registry — record round-trip, consumer markers + dead-PID pruning, findReusableServe health gating, and a non-destructive sweep that drops dead records, kills runner-orphaned serves, and leaves a healthy owned serve untouched.
- managed-fleet-key — stable, order-insensitive key; changes with per-model config and host.
- managed-runner — pure idle-reap decision (consumer liveness + timeout, incl. zero-timeout private serves).
- managed-serve-process — fake-serve driven: healthy, start-timeout, early-exit/crash, SIGKILL-escalation.
- managed-provider — end-to-end through the real detached runner: auto-spawn, reuse (second createQvac → same serve pid), and idle-reap (serve gone + registry cleared after the timeout).
- managed-config — per-model config emission + duplicate-name rejection.
Manually verified end-to-end against a real cached Qwen3-8B serve (tasks harness): healthy spawn → real /v1/chat/completions → second session reused the same pid → close() left it running while a consumer remained → idle-reaped (process gone + registry empty) once the last consumer left.

🔌 API Changes

New mode: 'managed' overload of createQvac (returns a Promise<ManagedQvacProvider>), plus the QvacManagedOptions / QvacManagedModel / ManagedQvacProvider types. External createQvac is unchanged.

import { createQvac } from '@qvac/ai-sdk-provider'
import { generateText } from 'ai'

// Auto-spawns (or REUSES) a shared `qvac serve`; resolves once it's healthy.
const qvac = await createQvac({
  mode: 'managed',
  models: [
    // Per-model serve config — agents need a real ctx window + no <think> blocks.
    { name: 'QWEN3_8B_INST_Q4_K_M', config: { ctx_size: 16384, reasoning_budget: 0 }, default: true }
  ],
  serveIdleTimeout: 300_000 // keep the shared serve 5 min after the last user exits
})

try {
  const { text } = await generateText({
    model: qvac('QWEN3_8B_INST_Q4_K_M'),
    prompt: 'Write a haiku about local-first AI.'
  })
  console.log(text)
} finally {
  await qvac.close() // detaches this session; a shared serve keeps running for others
}

// AsyncDisposable — `await using` handles detach automatically:
// await using qvac = await createQvac({ mode: 'managed', models: ['QWEN3_600M_INST_Q4'] })

QvacManagedOptions (all optional besides mode/models): servePort, serveHost, serveStartTimeout, serveBinPath, reuse (default true; false ⇒ private serve), serveIdleTimeout (default 300000), plus the shared apiKey / headers / fetch. QvacManagedModel: { name, config?, preload?, default? }. ManagedQvacProvider exposes close(), [Symbol.asyncDispose], port, pid, baseURL.

Follow-up (separate task): version bump to 0.2.0 + publish via the fork → release branch → main flow.

Add `mode: 'managed'` so the provider can synthesize an ephemeral qvac.config.json from a model-constant list, spawn and supervise `qvac serve` on a free port, and tear it down on host exit. External mode is unchanged and stays synchronous; the managed supervisor is lazily dynamic-imported so external-mode users pay no startup cost. @qvac/cli becomes an optional peer dependency.

…json (QVAC-19900) The published @qvac/cli ships a string `exports` field ("./dist/index.js"), which makes the `./package.json` subpath non-resolvable (ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving `@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI on a clean install. Fall back to resolving the package main entry, which for @qvac/cli is the same file as the `qvac` bin.

…sdk-provider-managed-mode

Managed mode `models` now accepts spec objects ({ name, config, preload, default }) alongside bare constant names, so callers can set per-model serve options — notably `ctx_size` and `reasoning_budget` — that coding agents like OpenCode require. The synthesized qvac.config.json carries the config block, honors explicit `preload`/`default`, and validates names inside spec objects. Exports the new `QvacManagedModel` type and documents per-model config plus a managed-mode OpenCode example in the README.

Rework managed mode from a per-provider supervisor into a shared, self-cleaning serve daemon so it is robust standalone and usable by any tool, not just a single session. - Reuse via a fleet key (model set + per-model config + host) keyed in a cross-process registry under ~/.qvac/managed-serves/; createQvac attaches to a matching healthy serve instead of cold-starting a duplicate. - A detached runner owns the qvac serve child and reaps it once no consumer process has been alive for serveIdleTimeout (default 5m). Liveness, not request traffic, is the signal, so it works for tools that hit baseURL directly (OpenCode/Cline/Aider). - close() now detaches (deregisters the consumer) instead of killing; a shared serve survives until its last user is gone. - Sweep only reaps dead/orphaned serves, never a healthy serve a live process owns (fixes a second session SIGKILLing a downloading serve). - Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED. - reuse:false (or a pinned servePort) yields a private serve reaped as soon as its owner exits. Refactor into serve-process.ts (spawn/health/stop), registry.ts, fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap end-to-end coverage; document the shared lifecycle in the README.

Each managed model maps to a single serve alias keyed by its name, so a repeated name silently overwrote the earlier entry — and could drop its `default: true`. Reject duplicates up front with DuplicateManagedModelError instead of resolving them ambiguously. Addresses PR review feedback.

- Per-instance consumer markers (<pid>.<rand>) so two providers in one process sharing a fleet key don't deregister each other on close (A). - Restrict respawn retry to ECONNREFUSED so an in-flight completion is never blindly replayed on ECONNRESET/EPIPE (C). - Health-check the recorded baseURL before SIGTERM-ing an orphaned serve, guarding against killing a recycled pid (D). - Use dirname() instead of a posix-only regex for ephemeral config cleanup (E). - Fold serveBinPath into the fleet key so distinct local builds don't share a serve (G). - Export managed error classes + QvacManagedErrorCode for instanceof checks (H). - Reject more than one explicit default: true (I). - Deregister the consumer if resolveServe throws (F); drop dead firstConsumerPid runner param (J). Tests: per-instance markers, health-gated orphan sweep (kills serving orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity, multiple-default rejection. README updated.

Lifecycle correctness: - Spawn lock: steal only when the owner pid is dead (with an mtime fallback for an unreadable lock), so a legitimate multi-minute cold start no longer loses its lock after 30s and spawns a duplicate runner/serve (tetherto#1). - close(): the fetch path now bails out instead of re-resolving once closed, so a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3). - sweepServes: when an orphaned serve's pid is alive but its health check fails, keep the record instead of dropping it — dropping stranded a live serve with no registry trace. We only reap once it answers as ours, or drop once its pid dies (tetherto#4). - servePort: fold a pinned port into the fleet key so pinned-port callers don't reuse an auto-allocated serve on a different port, and distinct pins don't collide (tetherto#5). - Respawn: expose baseURL/port/pid as getters over live state, updated on every reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6). - retargetUrl now handles Request inputs (not just string/URL) so a respawn stays transparent if the SDK ever switches input shapes (tetherto#8). Docs: - README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend liveness; document the long-lived-sentinel/wrapper pattern and fix the misleading "the script doesn't have to stay running" note (tetherto#2). - Reconcile version wording: README/changelog now describe managed mode as unreleased (package is 0.1.0); docs-site integration page documents managed mode + the async overload (tetherto#7). Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the runner-dead + serve-alive + health-failing sweep case. Build + suite green (60 pass / 1 integration skip).

…naged mode Add `models.qvacCatalog`, a public models.dev-style catalog that maps friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads (`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev resolves end-to-end with no translation layer in front of the serve. Managed mode now accepts catalog ids as model names: the synthesized serve config keys the alias by the friendly id while `model` resolves to the underlying SDK constant, so the serve answers `qwen3.5-9b` directly. Bare SDK constants keep working unchanged. A drift unit test fails CI if any catalog constant disappears from the generated SDK catalog.

lauripiisang

few notes that probably should be addressed.
Also naming isn't super obvious re: const re= await reresolve()

Harden managed-mode lifecycle so a managed serve never leaks its `bare` inference worker or outlives the process that owns it. - Process-group teardown: spawn `qvac serve` detached (its own group) and, when stopServe must escalate past the grace window, SIGKILL the whole group. A plain SIGKILL of the serve pid never cascades to the grandchild bare worker, so previously a wedged serve orphaned the worker. The graceful SIGTERM is still sent to the serve process only, so a healthy serve orchestrates its own shutdown and releases the global worker lock (no stale lock left behind); the group SIGKILL is the wedged-path fallback. - `closeOnParentExit` option: for a daemon-style host whose sole job is to keep a managed serve alive for a parent process (e.g. an editor/agent plugin). The provider watches its parent pid and, the moment the parent exits (on POSIX we are reparented to init, ppid → 1), closes itself — deregistering the consumer so the runner reaps the serve — and exits. Without it a hard-killed parent would leave a reparented host alive, keeping its consumer marker forever so the serve was never reaped. Tests: a stubborn-grandchild fake serve proves group teardown reaps the worker; `parentIsGone` unit-tests the parent-watch decision.

…ce and crash-respawn - Undo the consumer re-registration when close() wins the race against an in-flight fetch retry: resolveServe re-adds the marker after close() removed it, which would keep the shared serve warm until the process exits. - Preserve live consumer markers when sweepServes reaps a crashed/orphaned serve, so a respawned runner inherits the still-alive sessions instead of idle-reaping the fresh serve out from under them. - docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.

…naged fetch

removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a confusing sync/async mirror: the async removeConsumer was only ever called right after the sync one (a guaranteed no-op), and the removeRecord pair was really two teardown semantics under near-identical names. Marker/record teardown is a single unlink/rm, cheap enough to be synchronous everywhere — including process 'exit' handlers where async can't run — so collapse each pair into one sync function. No behaviour change; addresses review feedback on tetherto#2408.

Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent (why sync, preserveConsumers semantics) without the narration.

Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the resolved CLI path verbatim) and ephemeralConfigName was an unused helper (writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the latter also drops the now-unused randomBytes import.

simon-iribarren · 2026-06-10T10:58:37Z

/review

github-actions · 2026-06-10T10:59:03Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (2/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

simon-iribarren requested review from a team as code owners June 3, 2026 09:47

simon-iribarren added tier1 verified Authorize secrets / label-gate in PR workflows labels Jun 3, 2026

opaninakuffo reviewed Jun 4, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/supervisor.ts Outdated

opaninakuffo reviewed Jun 4, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/config-synthesizer.ts Outdated

opaninakuffo reviewed Jun 4, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/supervisor.ts Outdated

simon-iribarren added 9 commits June 5, 2026 12:37

doc: update ai-sdk provider agent setup after queue (QVAC-19900)

7bc2c12

Merge remote-tracking branch 'upstream/main' into feat/qvac-19900-ai-…

af540a9

…sdk-provider-managed-mode

docs: use canonical qvac.tether.io URL in ai-sdk-provider README

8fc5df6

Merge branch 'main' into feat/qvac-19900-ai-sdk-provider-managed-mode

ddc5075

lauripiisang reviewed Jun 9, 2026

View reviewed changes

Comment thread docs/website/content/docs/cli/http-server/integration.mdx Outdated

lauripiisang reviewed Jun 9, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/index.ts Outdated

Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated

simon-iribarren added 4 commits June 9, 2026 17:34

Merge branch 'main' into feat/qvac-19900-ai-sdk-provider-managed-mode

c0e39c6

QVAC-19900 fix: rename reresolve result to resolved for clarity in ma…

4c9d4c5

…naged fetch

opaninakuffo previously approved these changes Jun 9, 2026

View reviewed changes

lauripiisang reviewed Jun 9, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated

lauripiisang reviewed Jun 9, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated

simon-iribarren dismissed opaninakuffo’s stale review via e418a55 June 10, 2026 07:17

simon-iribarren added 2 commits June 10, 2026 09:24

QVAC-19900 mod: trim verbose comments in managed registry

11477ab

Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent (why sync, preserveConsumers semantics) without the narration.

Merge branch 'main' into feat/qvac-19900-ai-sdk-provider-managed-mode

1191b78

arun-mani-j reviewed Jun 10, 2026

View reviewed changes

Comment thread packages/ai-sdk-provider/src/defaults.ts Outdated

Comment thread packages/ai-sdk-provider/src/managed/config-synthesizer.ts Outdated

simon-iribarren added 2 commits June 10, 2026 11:21

Merge branch 'main' into feat/qvac-19900-ai-sdk-provider-managed-mode

7351a25

arun-mani-j approved these changes Jun 10, 2026

View reviewed changes

lauripiisang approved these changes Jun 10, 2026

View reviewed changes

opaninakuffo approved these changes Jun 10, 2026

View reviewed changes

Merge branch 'main' into feat/qvac-19900-ai-sdk-provider-managed-mode

817589b

simon-iribarren merged commit 95eb489 into tetherto:main Jun 10, 2026
4 checks passed

simon-iribarren mentioned this pull request Jun 10, 2026

QVAC-19908 feat[api]: add @qvac/opencode-plugin (turnkey local OpenCode) #2521

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider#2408

QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider#2408
simon-iribarren merged 22 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-19900-ai-sdk-provider-managed-mode

simon-iribarren commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lauripiisang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simon-iribarren commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

simon-iribarren commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

🔌 API Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lauripiisang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simon-iribarren commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Tier-based Approval Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simon-iribarren commented Jun 3, 2026 •

edited

Loading